DOCUMENTS EXPEDITING PROJE 
[COMMITTEE PRINT] 


86TH CONGRESS REPORT 
2d Session } SENATE {No ome 


DOCUMENTATION, INDEXING, AND 
RETRIEVAL OF SCIENTIFIC 
INFORMATION 


REPORT 


OF THE 


COMMITTEE ON GOVERNMENT OPERATIONS 
UNITED STATES SENATE 


W 


MAY 24, 1960 


Printed for the use of the Committee on Government Operations 


UNITED STATES 
GOVERNMENT PRINTING OFFICE 
WASHINGTON : 1960 








COMMITTED ON GOVERNMENT OPERATIONS 
JOHN L. McCLELLAN, Arkansas, Chairman 


HENRY M. JACKSON, Washington KARL E. MUNDT, South Dakota 
SAM J. ERVIN, Jr., North Carolina CARL T. CURTIS, Nebraska 
HUBERT H. HUMPHREY, Minnesota JACOB K. JAVITS, New York 


ERNEST GRUENING, Alaska 
EDMUND 8. MUSKIE, Maine 
WALTER L. REYNOLDs, Chief Clerk and Staff Director 
GLENN K. SHRIVER, Professional Staff Member 
Ett E. NORLEMAN, Professional Staff Member 
Mites ScuLL, Jr., Professional Staff Member 
W. E. O'BRIEN, Professional Staff Member 
ArtTHuR A. SHARP, Professional Staff Member 





PERMANENT SUBCOMMITTEE ON INVESTIGATIONS 


JQHN L. McCLELLAN, Arkansas, Chairman 


HENRY M. JACKSON, Washington KARL E. MUNDT, South Dakota 
SAM J. ERVIN, Jr., North Carolina CARL T. CURTIS, Nebraska 
EDMUND 8. MUSKIE, Maine 


DONALD F, O’DONNELL, Acting Chief Counsel 





SUBCOMMITTEE ON REORGANIZATION AND INTERNATIONAL ORGANIZATIONS 


HUBERT H. HUMPHREY, Minnesota, Chairman 


JOHN L. McCLELLAN, Arkansas HOMER E, CAPEHART, Indiana 
ERNEST GRUENING, Alaska KARL E. MUNDT, South Dakota 
EDMUND 8. MUSKIE, Maine 


Jutius N. Cann, Director of Medical Research Project 





SUBCOMMITTEE ON NATIONAL PoLicy MACHINERY 


HENRY M. JACKSON, Washington, Chairman 


HUBERT H. HUMPHREY, Minnesota KARL E. MUNDT, South Dakota 
EDMUND 8. MUSKIE, Maine JACOB K. JAVITS, New York 


J. K. MANSFIELD, Staff Director 
It 


CONTENTS 


eC owen eo ons cn cnccnncndubhl debubes cnncnedlemade Ae 
Background... ... .2o5-~. nsccddesnccesnsneess- elise das ee 
S. 493, 80th C ongress, Technical Information and Services Act, 1947___- 
Creation of National Science Foundation.._._.._____.___...____..- 
Proposed Science and Technology Act of 1958 
National Defense Education Act of 1958_____--..--.--_-_---_._- 
Science Program—86th Congress (S. Rept. 120)_____.__._.-_._- 2. 
Establishment of a Commission on a Department of Science and 
TROONIET 4 5. ctince 6 Cina edensns eden ona ncenk nian de aes. 

Need for improvement of science information systems_________.__._____ 
Summary of agency activities__.____...__..-_______- Ug: Soissogiguuad 
Department of Defends (DOD)..i 2. foe 6 i tguel_in 
Armed Services Technical Information Agency (ASTIA) 
Current ARDC technieal efforts (CATE program) 
Office of- Neva}. Researuh (ON RB)... ..-. vidos or ove lec 
Office of Defense Research and Engineering _____________________- 
Bio-Sciences Information Exchange (BSIE), Smithsonian Institution __ 
Atomic Energy. Commission (AEC)____________--_.-2 2-2 i ie 
Central Intelligence Agency (CIA)_-.---.--_---2- - sell 
Library of -Com@rett..<<onceces os SSOUAUEULG Tuts ce cues 
Department of Commerce 
U.S Patent Olive ics 2c ceagiclice wees waned ipicw 
National Bureau of Standards (NBS)_________._-.__---------- 

Data processing operations___.____.__---_-_---------+------ 
National Library of Medfettie 2.221 02)2246 4. Sebuvasl lL bucee 
National Library of Agriculture (USDA) ME DOS UAT 2UILGL goles 27 . 
National Science Foundation (NSF) ______- QOS =>. 
Otte? Petdieral ReenithS iO se S02. ee ok 2 Jo. wget. 
Department of Agriculture_---- - aRkos ese ingen pwanaRCCwANe. 
Department. of the Treasury - ---------..--.----------------- 
Department of Health, Education, and Welfare 

Federal Communications Commission. --------.-------------- 
Interstate Commerce Commission _ _--------.-------------- 18 

TRRETOGr SURGMTIIIIEE NII te oink mnkrag a eae en 
National Academy of Sciences—National Research Council 
Cooperation of Industry 


Part I 
ACTIVITIES AND PROGRAMS OF FEDERAL AGENCIES 


Armed Services Technical Information Agency 
Atomic. Enesgy. Commission.__.__..... =... >. nied alous dnb deed 
Nuclear Science Abstracts... 2.222 Sezed asc cuslatewe 
NGA indéting pattern... .0 i. ds siuccas h ebstaatd «ch 
Advance planning for subject and corporate indexes_______~__-- 
gy and coding of subject and corporate main heading 

cards 


Daily workloads for all regular NSA issues___---_._.-_-------- 
Preparation of journal reproduction copy__-___......--------- 
Mechanical arrangement of indexes___________.______----.--- 
Mechanized camera and page makeup 
Preparation of cumulated indexes 
Elimination of card catalogs. nial. ec dee Le ccs 
Research and development reports_._____________--------.------- 
Proceedings of scientific symposia and meetings 
Wepiupival ould ig (ile) uale solisuwOcu. Ll dewagans sta wisuiet 
Technical progress reviews_.._. 2 =. 22.2 ole settee ss. -es-sne- 
Bibliographies... .....-_-_- ein menncnnn Os «setae. sais 
Translations... 00 uur sales belli. J_ads oa seen 
Engineering drawings 
Technical films 
TIERS TIO i i i ecg npn nn + <hinamignnn ann ted. ae 
EOrMIAtION eSUnene aa i ae 
Bio-Sciences Information Exchange 


Status of registration of extramural and intramural Government re- 
search 


Central Intelligence Agency 





Page 


NO Oorrhe 








IV CONTENTS 


Commerce, Department of 
National Bureau of Stz andards.. nate 


Dat a services _ _ - 
Equipment developments 
NBS publications___- owen tel ta dod. cotisonhodaanaist(h lencitel 
Rurean of Public Roade<i.iaai 4. sesuuud) «iid-~suesneed aude 
Coast and Geodetic Survey 
TN oo ec balee cal bes webewmn ees CROMER. 
U.S, Patent Offioe . . csetevc ssddamwiui. eausin-3n issumuceunes os 
Data-processing panaeneee of department 
Library of Congress__._.-..--. 
Mechanization of services and functions..._...-.----------------- 
Problems of information retrieval in public libraries 
National Meience Foundation... .. .-.<- 2d shi bE plod cndd shes hs on Sus 
Current developments in scientific information 
Flow al tbfortnation.. 3 vcs 4 LAE) weeasboodl sedbieobal ac¢ectuead 
Research on scientific information problems 
Problems at the source... ........- ek) isin upmilisial locus 
Primary and secondary publications ba wise ee bee pceus ese ky weqredie 
Data and reference services_______---- «sn RO eeERe sm he Seco iwc 
Foreign science infermation.........<...~... casiitd-nestell Je 2i--- 
Appendix A.—Projects in the general area of mechanized handling of 
information supported by the NSF__.._..-.--.-----..--------- 
Appendix B.—Examples of activities of other Government agencies 
pertaining to mechanized handling of information_._..........---- 
Appendix C.—Examples of non-Government. activities which are 
developing or are supporting the development of mechanized 
SYSUCMMS.. . - . - oe nee - = sas pees “be daneenath~-< 
Appendix D.—Physical sciences information exchange-_-_.._...--.---- 
Appendix E.— Numbers of abstracts or title listings published by the 
National Federation of Science Abstracting and Indexing Services - 
U.S. Department of Agriculture Library_...........-------.-.--.-+--- 


Part II 


SCIENCE INFORMATION AND RETRIEVAL SYSTEMS 
AND PROGRAMS OF NONGOVERNMENT GROUPS 


American Institute of Biological Sciences (AIBS)-.--..----------------- 
ANOS CON. (IGEEET GENE). 2 ob eecetnausaddasnetcsce=<oae- aa 
Bell Telephone Laboratories... ..isi226-sclbusselith lene ese ae_b 
Shemical Avetracte Gervice..............-. 5... eee ee 
Documentation Incorporated._...............-.ctusseli_sasboe sealaus 
=. I. du Pont de Nemours @ Co. 2... ...02..-sdutnepaiiobui AH --- 
Appendix.—A practical system for documenting building research _ - - 
Esso Resempeh’& Engineering Cou -e¢iis Jo. nelews Jace see tbaaueel--- 
EIN I I rie a a aia nticatt a antenna sign aaa = = = ni 
International Business Machines Gers. TERM) fu< sat gbedliuve clictl ---. 
itek Cofp... ..- 2... ius Ube Cade breed. code ..- 
PVG sin scree oe ere >= SOO ULY Je. ens seine. ... 
Development in information handling systems..............-.----- 
PURROAGIONS... .-..-- oo ewe BELL Da Leo od eel - - - 
Jonker Business Machines, Inc. .-..-...-2yUsi23 Joun_ln eed Lact ct - 
Preliminary proposal T.—Outline plan for nationwide network for the 
flow of scientific and technological information _____._._-.-.------ 
Preliminary proposal II.—Outline plan for simple low-cost equipments 
for use in a nationwide network_~__ 22-022. 222ce bee le boil--- 
Machine Transiation,-Inc..........-+..~.-.-...-.~..-- caligasauils | 
Report on the U ‘nified Transfer System (UTS)...........sudiskeud 
McGraw-Hill Publishing Co... ..................smuueual. yetiadeiad 
Ramo-Wooldridge___.--------- Sosainibc lahat asp ET si 
CUI COED so co ww rs eet cn tn wenn nen SR ee 
Feeaina Foedick........+.~....... +. + il oliemials 
The Minicard system... -.-_sguuubal sedi leuuntal 2oouwkl 
Remington Rand’ Univaciiicili. sub. seleclolze li sudeeiulwer Jee 
Smith Kline & French Laboratories.._.........----...... ...--dsi a 
Stanford Research Institute (SRI)........-------24u yuh sucauilias ol bes 


Ne OE TS VOIVRUE Ge A) 6 os sais owe ncnsneinniatbdeaedviaee 


140 
142 


144 
144 


279 


i id 








86TH CONGRESS } SENATE Report 
2d Session No. 





DOCUMENTATION, INDEXING, AND RETRIEVAL OF 
‘SCIENTIFIC INFORMATION 


JU 2S —Ordered to be printed 


Mr. McCie.ian, from the Committee on Government Operations, 
submitted the following 


REPORT 


OBJECTIVE 


Following adjournment of the 1st session of the 86th Congress, the 
staff of the Committee on Government Operations was directed to 
undertake a study and evaluation of progress made since 1958 in 
regard to the development of science information processing and 
retrieval programs and systems established by Federal agencies. 
Special attention was to be directed to agency. actions taken to 
implement recommendations made to the Subcommittee on Reor- 
ganization of the Committee on Government Operations in the 85th 
and 86th Congresses, for the improvement of science information 
retrieval processes. 

The staff was instructed to submit a report on the systems now in 
operation or being developed by the Federal agencies operating in 
scientific areas concerned with specialized programs involving as- 
sembling, coordinating, indexing, retrieving, and disseminating scien- 
tific and technological information and data required to carry on their 
programs more efficiently and expeditiously. Special emphasis was 
to be given to systems engineering, the development and utilization of 
retrieval systems, and the economical utilization of electronic machines 
or equipment now available or being designed to speed up the retrieval 
process. 


BACKGROUND 


S. 493, 80th Congress, Technical Information and Services Act, 1947 


The interest of the Committee on Government Operations in the 
development of a coordinated program for the assembly, analysis, 
indexing, storage, retrieval, and dissemination of scientific information 
began when the committee was first established in the 80th Congress. 
Immediately after its creation, under the provisions of the Legislative 


1 





2 DOCUMENTATION OF SCIENTIFIC INFORMATION 


Reorganization Act of 1946, the committee (then the Committee on 
Expenditures in the Executive Departments) held hearings on a bill 
(S. 493), introduced by Senators Fulbright and Aiken (then chairman 
of the committee) to provide for the coordination of agencies dissemi- 
nating technological and scientific information, and for the more 
efficient administration of an information exchange program. 

The bill was supported by many of the leading authorities of the 
scientific community who were familiar with the deficiencies in the 
Federal structure in this field. Its immediate objective was to enable 
the Government to continue some of the scientific operations which 
had been developed during World War II, such as the National 
Inventors Council (NIC), the Office of Scientific Research and De- 
velopment (OSRD), etc. The proposed program was also designed 
to provide a medium for analyzing, classifying, and distributing the 
vast amount of technological information gathered by these organiza- 
tions, other Federal agencies, and by special teams composed of scien- 
tists, engineers, technicians, and researchers in Germany following the 
war. 

S. 493 proposed that the Office of Technical Services in the Depart- 
ment of Commerce undertake to assemble, analyze, translate, and dis- 
seminate such information as was found to have potential benefit 
to American industry, and to develop, through the utilization of the 
facilities of the National Bureau of Standards or private research 
laboratories, those ideas or scientific formulas which might be found 
to be of value to American industry as a whole. 

Dr. Vannevar Bush, Director of the Office of Scientific Research 
and Development, expressed agreement with the basic objectives of 
the bill stating that— 


improvement of means for tapping the vast store of native 
ingenuity of American people, for the sifting of the ideas 
thus brought to light, for developing those of potential value, 
and for making them readily available to users is wholly 
desirable. 


The then Secretary of Commerce, William Averell Harriman, 
pointed out that— 


The great majority of business firms are at a disadvantage in 
securing the benefits of up-to-the-minute technological knowl- 
edge because they cannot afford, in view of their relatively 
limited size, to employ or contract for the research personnel 
and activities necessary to channel to their own uses the vast 
reservoir of present-day technology. 


The then’Secretary of Agriculture, Clinton P. Anderson, advised the 
committee that it was the belief of the Department of Agriculture 
that— 


legislation for the purposes indicated would provide desirable 
stimulation to the development of new discoveries. 


The then Acting Secretary of the Navy, John L. Sullivan, stated 
that— 


The Navy Department considers that the establishment of a 
centralized agency in the Government having full informa- 
tion on all inventions and discoveries, patented or un- 


DOCUMENTATION OF SCIENTIFIC INFORMATION 





patented, which results from current research and develop- 
ment would be desirable * * *. 


This position was also reaffirmed by testimony before the committee 
by Rear Adm. Paul F. Lee, Chief of Naval Research. 

The Board of Directors of the American Chemical Society, informed 
the committee that— 


Our own military efforts resulted in important additions to 
scientific knowledge and in great and far-reaching improve- 
ments in industrial techniques. The proper authorities are 
still engaged in declassifying such of this work as is not of 
military significance. But mere declassification of material 
is not enough. It must. be made available to all who may 
wish to use it. To accomplish this requires a clearing- 
house of the type authorized by this proposed legislation. 
We believe that such a purpose has great value and that it 
can only be accomplished by a Federal agency. We wish, 
therefore, to record our support to the proposal for this 
objective as contained in the bill now under discussion. 


Maj. Gen. Henry 8S. Aurand, Director of the Research and Develop- 
ment Division of the General Staff, stated for the War Department: 


The wide dissemination of scientific and technical infor- 
mation is the cornerstone of scientific progress. At the 
present time there is no central clearinghouse either in the 
Government or in private life for the broad field of scientific 
and technical information. Government or private research 
organizations in instituting new projects must sometimes 
spend considerable sums on preliminary documentary re- 
search in order to obtain background material from many 
different technical libraries and document collections 
throughout the United States. Presently existing library 
procedures are not adequate for the proper cataloging and 
indexing of technical information from a research and devel- 
opment point of view, since they were designed for the 
general user and not specifically for the needs of the re- 
searcher in seeking new fields of knowledge and new applica- 
tions of existing fields. The provision for a single agency to 
record and disseminate scientific information for scientific 
and technical users should result in reduced cost of research 
and in accelerated advancement of scientific knowledge. 


The Research and Policy Committee of the Committee for Economic 
Development issued a report under date of June 12, 1947, in which it 
was pointed out that a businessman needs a method of pooling experi- 
ence to provide him with the research information that is being gath- 
ered from many sources. The committee reported that— 


Government agencies now have available many different 
kinds of statistics, research findings, information about new 
scientific discoveries, and special reports and useful data of 
all kinds. These services and materials should be made more 
widely available and should be brought more effectively to 
the attention of the businessman. 








4 DOCUMENTATION OF SCIENTIFIC INFORMATION 


Creation of National Science Foundation (NSF) 


The bill, S. 493, was reported favorably to the Senate on June 27, 
1947 (S. Rept. 395, 80th Cong.). The Senate took no further action, 
due to the fact that scientists and industry generally opposed some of 
the features of the bill. Instead, the Congress, in attempting to over- 
come this opposition, and upon the recommendation of Dr. Vannevar 
Bush, Director of the Office of Scientific Research and Development, 
and a number of witnesses who testified on S. 493, created the National 
Science Foundation, which had also been pending in Congress, as had 
a bill similar to S. 493 (S. 1248), in the 79th Congress. (A summary 
of the legislative history of S. 1248 and S. 493 is contained in a com- 
mittee print of a report on “Government Assistance to Invention and 
Research” (p. 70-85), dated December 28, 1959, compiled by the 
Subcommittee on Patents, Trademarks, and Copyrights, Senate Com- 
mittee on the Judiciary). 

The NSF was placed under the direct control of members of the 
scientific community, with supervision and direction by a 24-man 
Board, authorized to establish policy and supervise the activities of 
the Director of the National Science Foundation, rather than under 
a Cabinet officer who would be vested with proper operating author- 
ity and who would be directly responsible to the President. 

Dr. Alan T. Waterman, Director of the National Science Founda- 
tion in a letter to the committee dated May 10, 1960, commented on 
this section of the report, as follows: 


We would like to point out that the National Science 
Board is composed of 24 members appointed by the Presi- 
dent by and with the advice and consent of the Senate and 
of the Director of the Foundation ex officio. The National 
Science Foundation Act of 1950, as amended, states that: 

“* * * persons nominated for appointment as members 
of the Board shall be (1) eminent in the fields of the basic 
sciences, medical sciences, engineering, agriculture, educa- 
tion, or public affairs; (2) selected solely on the basis of es- 
tablished records of distinguished service; and (3) so selected 
as to provide representation of the views of scientific leaders 
in all areas of the Nation.” 

In general, policymaking is primarily the function of the 
National Science Board. In addition to being responsible 
for those policy matters which are not the responsibility of 
the Board, the Director serves as chief executive officer of 
the Foundation. It should be noted that the Director of the 
Foundation is appointed by the President, by and with the 
advice and consent of the Senate, and that he is, therefore, 
directly responsible to the President. 


A subsequent act of Congress (Public Law 776, 8lst Cong., the 
Technological and Scientific Act of 1950), gave the OTS authority to 
act in the information processing field, as proposed by S. 493, but the 
funds necessary to operate the program effectively were never appro- 
priated by the Congress. 


Proposed Science and Technology Act of 1958 

At the direction of the committee, the staff, in the 85th Congress, 
undertook a study to analyze and evaluate the various Federal activ- 
ities which related to science and technology. As a result of this study, 





DOCUMENTATION OF SCIENTIFIC INFORMATION 5 
the chairman submitted to the Senate, under date of April 17, 1958, 
a summary of proposed legislation entitled ‘Science and Tech- 
nology Act of 1958 (S. 3126),’’ which proposed the creation of a De- 
partment of Science and Technology; the creation of standing Com- 
mittees on Science and Technology in the Congress; the establishment 
of National Institutes of Scientific Research; the authorization of a 
program of Federal loans and loan insurance for college or university 
education in the physical or biological sciences, mathematics, or 
engineering; and the establishment of scientific programs outside of 
the United States (S. Doc. 90, 85th Cong.). 

Title I of the bill proposed the creation of a Department of Science 
and Technology, and provided that the Office of Technical Services 
of the Department of Commerce be transferred to the new depart- 
ment. The bill would have established a Bureau of Technical 
Services with responsibility for (a) developing a complete science 
information program, utilizing all facilities of the Federal Government 
now vested in agencies which operated related programs; (b) acquire, 
in cooperation with other public or private agencies, scientific litera- 
ture, both from foreign and domestic sources; (c) establish necessary 
facilities within the Bureau, or in other public or private agencies, 
to collate, declassify, translate, abstract, index, store, retrieve, and 
disseminate information essential to the development of scientific and 
technological programs as may be determined to be in the national 
interest and consistent with security requirements; and (d) encourage 
the elimination of duplication of effort through the integration and 
coordination of functions vested in the Bureau and in other agencies. 
The bill also would have given the Bureau added responsibility for 
the development and utilization of mechanical aids and new devices 
for collating, translating, abstracting, indexing, storing, and retrieving 
scientific and technological information under the control of the 
Federal Government, and to coordinate such data as may be available 
from other sources (S. Doc. 90, 85th Cong., p. 60). 

At the request of the chairman of the Senate Special Committee on 
Space and Astronautics, S. 3126 was referred to that committee, on 
April 25, 1958, for further consideration. It was eed, however, 
that the Committee on Government Operations would retain juris- 
diction over that section of the bill relating to the processing of 
scientific and technological information. 


National Defense Education Act of 1958 


The Congress, in recognition of the need for the services which 
would have been provided for by S. 3126, amended the National 
Defense Education Act of 1958 (title IX, Public Law 85-864) pro- 
viding for the establishment within the National Science Foundation 
of a Science Information Service (1) to provide, or arrange for the 
provision of indexing, abstracting, translating, and other services 
leading to a more effective dissemination of scientific information, 
and (2) to undertake programs to develop new or improved methods, 
including mechanized systems, for making scientific information more 
readily available. 

The act also provided that the National Science Foundation shall 
establish a Science Information Council, including, ex officio, the 
Librarian of Congress, the Director of the National Library of Medi- 
cine, the Director of the Department of Agriculture Library, and the 








6 DOCUMENTATION OF SCIENTIFIC INFORMATION 


head of the Science Information Service to advise, consult with, and 
make recommendations to the head of the Science Information Service. 

As will be set forth elsewhere in this report, there is serious doubt 
as to whether or not this type of organization is providing or ever will 
provide the necessary services required by the Federal Government 
in the scientific documentation field, primarily because the NSF is a 
nonoperating agency and its policy responsibilities are widely diffused 
among representatives of all fields of science. 

Senator Hubert H. Humphrey pointed out, in an article which 
appeared in the January 1960 issue of the Annals of the American 
Academy of Political and Social Science (p. 33), that— 


Those who place their reliance on coordinating devices 
such as interagency committees for improving communica- 
tions and for providing stimulus to Government science 
activities fail to recognize the built-in limitations of these 
approaches. By their very nature interagency committees 
cannot be creative except at a very low level of operation. 
They are inevitably restricted by the view of the most pedes- 
trian and unimaginative members. Their product will 
almost certainly be a kind of lowest common denominator of 
their combined ideas. What is needed is dynamic, forceful, 
and continuing leadership which could best come from a 
cabinet department with clear-cut responsibility and author- 
ity in the field of science and technology. 


Science Program—86th Congress (S. Rept. 120) 


Legislation was introduced in the 1st session of the 86th Congress 
(S. 676) providing for the creation of a Department of Science and 
Technology, designed to implement the committee’s proposed pro- 
gram. The revised bill contained no provision relative to scientific 
information processing, in view of the action already taken by the 
Congress vesting this responsibility in the NSF. The committee 
therefore directed the staff to prepare, in cooperation with officials of 
the NSF, a résumé of (a) actions taken to implement the information 
program, (b) its present status, and (c) what further actions might be 
required to provide guidance ‘and support for its full development, 
for further committee consideration. Pursuant to this committee 
directive, a report was submitted to the Senate on March 23, 1959, 
entitled ‘Science Program—86th Congress’ (S. Rept. 120, 86th 
Cong.), which set forth actions taken or proposed with the objective 
of perfecting an efficient program for assembling, processing, and 
retrieving scientific information, including the status of the Federal 
science information program as of that date. 


Establishment of a Commission on a Department of Science and Tech- 
nology 

At hearings on S. 676 and related bills during the Ist session of the 
86th Congress, some of the witnesses supported the objectives of S. 
676 and others expressed opposition, at least until more information 
was available, as to which agencies of the Federal Government should 
be incorporated into the proposed new Department or in any agency 
that may be established for the centralization of such activities. 

The committee approved recommendations made at the hearings, 
that there was an urgent need for the appointment of a commission, 
patterned along the tem of the Hoover Commission, to conduct a 





DOCUMENTATION OF SCIENTIFIC INFORMATION 7 
study as to whether or not a Department of Science should be created, 
and, if such a department was found to be desirable, that the proposed 
Commission should recommend to the President and to the Congress 
which functions now being performed by other departments, agencies, 
and independent establishments of the Government should be trans- 
ferred to such department. 

The committee, after further hearings on a bill, S. 1851, providing 
for the creation of such a commission, to be composed of authorities 
eminent in the field of science who are recognized leaders of the 
scientific community, representatives of the Federal Government 
agencies who were engaged in basic civilian science activities, and of 
members of the legislative branch of the Government, submitted a 
favorable report to the Senate (S. Rept. 408) on June 18, 1959. S.1851 
is presently pending on the Senate calendar. 


NEED FOR IMPROVEMENT OF ScIENCE INFORMATION SysTEMS 


In accordance with a further directive from the committee, members 
of the staff conferred with representatives of selected agencies during 
December 1959 and the first quarter of 1960, relative to their experi- 
mentations with, limited use of, and future plans to use mechanized 
systems for assembling, translating, and disseminating scientific 
information. The study was limited in its coverage and restricted to 
those agencies most representative of Government operations in the 
fields closely related to the development of effective programs for 
the improvement of information processing in the fields of science and 
technology. 

Representatives of all agencies generally agreed that there was an 
urgent need for the development of improved systems of engineering 
and for the installation of mechanical electronic retrieval equipment 
adaptable to specific programs, in order to make certain that all 
available scientific information would be readily accessible to Govern- 
ment agencies and to members of the scientific community. As will 
be set forth in this report, many Federal agencies—particularly the 
AEC, CTIA, ASTTA, Patent Office, and the National Library of Medi- 
cine, are making rapid progress in evaluating and developing such 
mechanized systems. 

In the view of the staff, however, other agencies are not placing 
proper emphasis on the importance of expediting the improvement 
of existing programs. Some of them, however, have initiated studies 
to determine (1) whether automation would result in more rapid in- 
formation retrieval and promote greater economy of operation; (2) 
whether a system can be devised and the necessary mechanized equip- 
ment is available to meet their specific needs; and (3) whether a 
changeover would expedite the processing of their own scientific 
information, and enable them to effect economies and overcome 
existing deficiencies in operations. Also, consideration is being given 
to the effect such a conversion from manual to mechanical operation 
might have upon their present index, abstract and punchcards sys- 
tems, and whether the present procedure could be efficiently trans- 
posed to mechanization without the loss of data and information 
essential to their programs. 

Another problem which agency officials must consider is the un- 
certainty as to what type of system and equipment would best serve 








8 DOCUMENTATION OF SCIENTIFIC INFORMATION 


the needs of these individual agencies, with proper regard to the cost 
involved. As will be illustrated in this report, the development of 
engineering systems and the production of mechanical equipment for 
information retrieval is a highly competitive business under rapid 
and continuous development. It is therefore difficult, in some in- 
stances, for officials responsible for the installation of such programs 
to determine the merits of the claims and counterclaims being made 
by the industry until they are better informed as to the specific 
system required to meet their requirements. A further obstacle 
involves the procurement of the necessary funds to fully implement 
the propesed programs. 

The need for improvement of science information programs in the 
United States is illustrated by a recent article by Oscar F. Gavrilovich 
of the Herald Tribune News Service which appeared in the Washing- 
ton Post under date of April 17, 1960, entitled ‘‘Soviet ‘I.R.’ Processes 
World’s Science News,” from which the following is extracted: 


On river-front Bereshovsky Boulevard in Moscow an im- 
posing seven-story building houses the headquarters of an 
organization more powerful, disciplined, and far-reaching 
than the most elaborate espionage system conceivable. 

Title of the organization is given as the simple initials, 
“T.R.” Strangely enough, these stand for the words in 
English, not Russian: “Information Retrieval.”’ 

The function of I.R. is to gather quickly and collate prop- 
erly every item of scientific or technological importance pub- 
lished everywhere around the globe, in whatever language 
the item may be printed. Speedily translated into Russian, 
all such material—in such varied fields as chemistry, physics, 
agriculture, metallurgy, medicine, and, of course, military 
and nuclear research—is made immediately available to that 
most favored class of Soviet society: its research-scientists. 

The hugeness of the task may be realized when the I.R. 
itself computes that the annual world output of scientific 
writing to which it has access includes 60,000 books, 100,000 
research treatises, 55,000 magazines, and about 1,200,000 
individual articles. Besides, I.R. endeavors to obtain the 
written description of newly patented inventions, and keeps 
a watchful eye for casual mention in general newspaper 
columns of any research in process or soon to be begun. 


Taken in stride 


The assembly of such voluminous data—plus immediate 
and accurate translation—is a job that might stagger veteran 
editors. Yet the staff of this unusual institute, under urgent 
governmental goading, takes it in stride. 

Just who comprise this energetic staff? At its start, about 
10 years ago, I.R.’s organizers undertook to assemble a 
battalion of translators. And these, they insisted, must 
not be mere interpreters. ‘They must also be experts thor- 
oughly trained by education or practical experience in what- 
ever field was covered by the material to be translated— 
they had to be able to evaluate the subject matter intelli- 
gently. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 





Between 2,500 and 3,000 highly trained persons are em- 
ployed at the I.R. building. 

These, however, comprise merely the top echelon. The 
I.R.’s entire actual ‘‘working force’’—its ‘‘adjunct-employ- 
ees,’ so to speak—may number several hundred thousand. 
Of these, many are resident citizens of the Soviet Union, 
people whose services are called upon on a part-time basis 
whenever material in their particular field of specialization 
arrives at headquarters. 


Is no secret 


Many thousands of others are attached to Soviet embas- 
sies, legations, consulates, trade missions, or functioning as 
‘technical advisers’ in countries all around the globe. All 
these field workers for I.R. are charged with the duty of 
transmitting to headquarters every book, periodical, or clip- 
ping which may fall into their hands. 

The Russians boast of the efficiency of I.R., and therefore 
it should be clear that this is no secret organization. Indeed, 
the general public, including even foreign visitors, is granted 
admission to at least two floors of the seven-story edifice. 
Access to the remaining five stories is permitted only to 
accredited Russian scientists, writers, educators, and the like. 

Pravda much further explains the intricacies of the I.R.’s 
functioning. It states that in important, advanced nations, 
such as the United States, France, Japan, Great Britain, and 
Red China, embassies have entire departments solely en- 
trusted with the job of keeping I.R. supplied with data. 

In the backward nations, personnel would, of necessity, 
be much smaller, but under strict injunction to forward word 
of any crop blight, economic discontent, and so forth. Thus 
I.R. combines the functions of the world’s leading news- 
gathering agencies with those of maintaining possibly the 
world’s largest scientific research library. 


SumMMARY OF AGENCY ACTIVITIES 


Brief summaries and basic aspects of the programs which have been 
or are being developed by some of these agencies follow. Details of 
the operations of the respective programs are set forth under part I 
of this report relating to the operations of the designated agency. 


DEPARTMENT OF DEFENSE (DOD) 


The Secretary of Defense reported to the committee that, during 
1959, research and engineering efforts within the Department of 
Defense were further strengthened by the issuance of specific terms of 
reference for the Director of Defense Research and Engineering which 
charge him with three major duties: (1) to be the principal adviser 
to the Secretary of Defense on scientific and technical matters; 
(2) to supervise research and engineering activities within the Depart- 
ment of Defense, including those in the National Security Agency, the 


Advanced Research Projects Agency, and the Defense Atomic Support 
Agency, as well as those in the military departments; and (3) to direct 
and control (including assignment and reassignment) the research and 








10 DOCUMENTATION OF SCIENTIFIC INFORMATION 


engineering activities that the Secretary of Defense deems to require 
centralized management. 

The staff in the office of the Director of Defense Research and 
Engineering was reorganized to provide a broader and more realistic 
evaluation of all research and engineering activities in the Department 
of Defense as well as improved direction and coordination in that vital 
field. The basic change in the organization was the establishment of 
positions for assistant directors to evaluate plans and supervise 
programs in operational systems related to air defense, communica- 
tions, tactical weapons, strategic weapons, undersea warfare, and 
special projects. Separate staffs were continued to coordinate pro- 
grams in technical fields such as aeronautics; atomic, biological, and 
chemical warfare; electronics, fuels, materials and ordnance; guided 
missiles; maintenance engineering; and science. The latter staffs 
also provide technical advice to the assistant directors, as required, in 
connection with operational systems. 

Following a conference with officials of ASTIA, the results of which 
are set forth in other sections of this report, the staff made repeated 
requests in December and January, to officials of the Department 
of Defense, for information and reports on certain specific activities of 
DOD in science information retrieval—operations of ONR (ASW), 
Project AFCIN and Air Force expenditures and utilization of the 
Minicard system—without result. These requests were confirmed by 
letter dated April 18, 1960, with which was enclosed a galley proof 
of the proposed report, and later by telephone. Dr. Orr E. Reynolds, 
Director, Office of Science, Office of the Director of Defense Research 
and Engineering, DOD, replied on May 5, 1960, as follows: 


Thank you for giving me the opportunity to review the 
galley proof of your report on ‘‘Documentation, Indexing, 
and Retrieval of Scientific Information”’ personally before its 
official circulation. Unfortunately, I was on travel status 
when this draft arrived and it has not been possible for me to 
frame comments on the report in time for your May 1 dead- 
line. I have also not as yet been able to obtain comments 
from Colonel Dunlop or Dr. Larrick as requested in your 
letter of April 18. 

I have reviewed the report from a standpoint of factual 
accuracy in areas of my own personal knowledge and official 
cognizance. I have not reviewed it from the standpoint of 
interpretation or conclusions of your staff; nor from the 
standpoint of general organization of the material; nor with 
respect to programs falling outside the cognizance of my 
office: In this respect I recommend that information rela- 
tive to Project AFCIN and the Air Force utilization of the 
Minicard system be obtained from the Chief of Staff, USAF. 


ARMED SERVICES TECHNICAL INFORMATION AGENCY (ASTIA) 


The Armed Services Technical Information Agency is the principal 
documentation center for unpublished technical and scientific reports 
which are issued as a result of, or relate to, research and development 
projects of the Department of Defense. It is operated by the Re- 
search and Development Command of the Air Force, under the policy 
direction of the Secretary of Defense. The agency is financed and 








DOCUMENTATION OF SCIENTIFIC INFORMATION ll 
operated as a centralized technical information center for the Army, 
Navy, and Air Force. It maintains and cross-services all of the re- 
search and development contracts with private concerns for all three 
branches of the service. 

ASTIA’s principal function is to catalog, abstract, index, and store 
military classified and unclassified scientific and technical reports; 
catalog their availability; release them upon receipt of requests in 
— of defense projects; provide bibliographical services and de- 
velop documentation standards in coordination with other Federal 
agencies. The agency also issues a technical abstract to Department 
of Defense agencies and others engaged in military or defense research. 
A part of this information-processing system, heretofore a manual 
operation, has now been partially mechanized. 

The services of ASTIA are based on the concept that technical 
documents generated as a result of research and development ac- 
tivities can effectively serve the national interest only if they are 
promptly “‘put to work’’—i.e., made available to those who require 
them in support of military research and development. 

The principal deficiency in this program is that it is confined to 
information emanating only from the Department of Defense, indus- 
tries, universities, and other research institutions generating informa- 
tion through Defense contracts, the NASA, CIA, and the AEC, but 
does not include scientific information generated by other sources 
outside of the Federal Government, other than in exceptional cases. 
A summary report of this program and activities of this operation, as 
submitted by officials of ASTIA, is included in this report under 
part I. 

CURRENT ARDC TECHNICAL EFFORTS (CATE PROGRAM) 


Lt. Col. James O. Vann, CATE project officer, submitted to the 
committee on May 11, 1960, a copy of a “Vocabulary for Current 
ARDC Technical Efforts” which contained a brief statement settin 
forth general information relative to the objectives of the CAT 
program as follows: 


1. The CATE program is an information system designed 
to provide a source for quickly identifying and locating the 
scientists and engineers working in the technical fields of 
interest to the Air Force. The purpose of the CATE pro- 
gram is to promote the interchange of technical information 
among the scientists and engineers in the Air Research and 
Development Command (ARDC), its staffs, governmental 
agencies, industries, and universities. The CATE program 
applies to ARDC research and development contracts. 
Reporting on other research and development efforts is not 
required but these efforts may be reported at the option of 
the contractor. 

2. The unit of information stored in the program is called 
a technical effort. A technical effort is defined for the pur- 
pose of this program as a singular investigation, either re- 
search or development, to discover knowledge for new mili- 
tary concepts or to explore or exploit present knowledge for 
new military capabilities. The technical effort is as detailed 
a division of current work as practical. Each technical 
effort is reported to the program on a specially prepared 





12 DOCUMENTATION OF SCIENTIFIC INFORMATION 


CATE ecard. CATE cards and vocabularies will be avail- 
able at the designated contact office in each participating 
agency. 

3. These technical efforts are the work units performed 
by the engineers and scientists located in the laboratories of 
the Air Force and the Air Force contractors. Program re- 
porting has been simplified to promote the free and rapid 
exchange of information. The maximum value of the 
CATE program to the Government—industry—university 
technical team can only be attained by the wholehearted 
and complete reporting of all current technical efforts. 

4. Only a small number of technical efforts reported are 
expected to be of a classified or proprietary nature, since the 
brief information reported does not disclose processes, tech- 
niques and purpose. In all cases, if the information is classi- 
fied or proprietary, all existing regulations and policies of 
governmental and nongovernmental agencies and contractors 
will be followed. 


Preliminary to hearings scheduled to be held by the House Com- 
mittee on Science and Astronautics, beginning on May 11, 1960, on 
mechanical translation devices, the chairman, Paiecdanvative Overton 
Brooks, announced that the Air Force had developed an intricate 
translation machine at its Rome Air Development Center. No infor- 
mation relative to the status of the project under which the electronic 
translator was developed at Griffiss AFB was submitted to the staff 
of this committee in response to its request for such data. 


OFFICE OF NAVAL RESEARCH (ONR) 


The Office of Naval Research advised that it utilizes the information 
generated by ASTIA, and depends to a large extent upon the data 
now being made available through that Agency. The question of 
establishing an information retrieval system in specific fields is, 
however, under active study and consideration. For instance, one 
such project now being studied by ONR is the need for information 
required in the development of the Navy’s antisubmarine warfare 
(ASW) program. Involved in this project is information on missiles, 
oceanography, radar, meteorology, etc., and the scientists in ONR 
are attempting to devise a mechanical retrieval system which would 
speed up the procurement of scientific and technological information 
necessary to fully expedite this important operation. Consideration 
is also being given to enlisting the assistance and active cooperation 
of engineering and equipment industries in the development and 
perfection of an adequate system and the utilization of mechanized 
equipment that may be employed to perfect a retrieval process ade- 
quate to meet the needs of the ASW program. 


OFFICE OF DEFENSE RESEARCH AND ENGINEERING 


Representatives of the Office of Defense Research and Engineering, 
Department of Defense, informed the staff that studies are underway 
to determine the feasibility of various mechanical information retrieval 
systems within the Department, and that much reliance is being placed 
upon ASTIA’s services. 


aka YES YS Fr ElUYrLhCCUw™ 








DOCUMENTATION OF SCIENTIFIC INFORMATION 13 

The Department has also enlisted the special facilities of others 
qualified in certain areas of science. For instance it was reported that 
a specialized program on information research in the field of strategic 
metals research and development (titanium, lithium, ete.) is being 
conducted by Battelle Memorial Institute under a U.S. Air Force 
contract. 


BIO-SCIENCES INFORMATION EXCHANGE (BSIE), SMITHSONIAN 
INSTITUTION 


The BSIE was not designed as a ‘‘documentation center.’”’ The 
services it performs in this field are a byproduct of its primary objec- 
tive of preventing unknowing duplication of research awards among 
the agencies that support it. 

Several agencies support and use the Bio-Sciences Information 
Exchange program now being operated within the framework of the 
Smithsonian Institution. This BSIE service has been in existence for 
some time. It was first operated within the National Institutes of 
Health, then by the National Research Council, and finally was 
transferred to the Smithsonian Institution. Within the Smithsonian 
Institution, it apparently functions as an autonomous group furnish- 
ing, broadly speaking, project control information to those ‘govern- 
mental agencies engaged in the extramural support of research in 
the biosciences. 

The Smithsonian Institution fully supports an exchange of infor- 
mation on current research. Discussions are in progress concernin 
the establishment of a Science Information Exchange which will 
encompass both the biological and physical sciences. The Governing 
Board of the Bio-Sciences Information Exchange has formally ex- 
pressed a willingness to expand the Exchange into the physical sciences 
and to enlarge the Board to provide adequate representation for the 
physical sciences. Should expansion become reality, the staff will be 
increased to include competencies in the necessary spectrum of the 
new disciplines and the Exchange will continue to make use of new 
developments to bring about maximum efficiency. 

The staff has been informed by some qualified experts in this field, 
and by BSIE, that the services rendered by BSIE in the past have 
not covered the field completely and that most of the Government 
agencies have maintained their own records to accomplish the same 
purpose for which this agency was created. In commenting on the 
operations of the BSIE, relative to alleged duplications of services, 
NSF contended that: 


The existence of a kind of central information facility such 
as BSIE does not, it seems to us, imply that all users of the 
service should cease similar or related operations of their 
own. Regardless of the effectiveness of the central facility, 
it may in some cases be in the best interests of good research 
management for individual agencies to continue to maintain 
internal records of their own agency research projects, pro- 
vided this activity does not unnecessarily increase or dupli- 
cate the workload of reporting to the central facility. 


54122—60——2 





14 DOCUMENTATION OF SCIENTIFIC INFORMATION 


The BSIE also pointed out in this connection that— 


many Government agencies keep records for their internal 
needs, but the BSIE was created as a mechanism for exchange 
of information between agencies on applications received and 
grants awarded. 


Dr. Ernest M. Allen, Chief, Division of Research Grants, NTH, 
did not concur with some of the points raised by the staff and, in a 
communication addressed to the committee dated April 28, 1960, 
stated: 


The Bio-Sciences Information Exchange was established 
in order that the various Government agencies providing 
support of biological and medical research might have 
knowledge of both applications received by other agencies 
and grants and contracts awarded. Upon invitation, the 
major foundations and voluntary agencies joined hands with 
these Government agencies in exchanging this type of infor- 
mation. The Public Health Service has found the data 
obtained to be of great value. We have received 14,000 
applications for research grants this year, for example, which 
required review by expert advisory committees. The mate- 
rial furnished by the Bio-Sciences Information Exchange on 
grants aw arded and applications pending for the scientists 
concerned in these applications has been available at the 
various slamittne meetings and has been of material assist- 
ance in the review and evaluation afforded. Without the 
benefit of the data provided by the Exchange, we would be 
forced to do a less than adequate review job. 

A secondary service provided by the Bio-Sciences Infor- 
mation Exchange has been the classifying of all grants and 
contracts for the various participating agencies by subject 
field. This classification system has permitted the Exchange 
to render valuable service to the Division of Research Grants, 
National Institutes of Health, in the preparation of replies to 
scientists who need information on work by other scientists in 
the same field. Furthermore, quarterly and special reports on 
the major items of this classification system have been help- 
ful to us in our overall planning. 

I don’t know who your expert advisers were, but I can 
assure you that, insofar as the National Institutes of Health 
is concerned, the advice did not represent those of us in 
official charge of the program. 


The staff also received a similar communication, dated April 29, 
1960, from Dr. Charles L. Dunham, Director, Division of Biology 
and Medicine, AEC, from which the following is quoted: 


As one of the participating agencies, I would like to assure 
you that the Bio-Sciences Information Exchange has on every 
occasion rendered completely satisfactory service in response 
to numerous requests, as well as in its routine activity. 

As a matter of fact we are so satisfied with the activities of 
BSIE that we are at the present time asking all of our “‘on- 
site’ national laboratories and large projects to put their pro- 
gram into the Exchange. We have had a very satisfying 


DOCUMENTATION OF SCIENTIFIC INFORMATION 





response from all of the directors, agreeing that this would 
be a valuable service. 

We wanted to let you know that the Division of Biology 
and Medicine of the U.S. Atomic Energy Commission was 


not one of the Government agencies you quote as being 
dissatisfied. 


In his letter to the staff, Dr. Leonard Carmichael also made refer- 
ence to a statement prepared by Dr. C. W. Shilling, Chairman of the 
Governing Board of BSIE, for the Subcommittee on Reorganization 
of this committee, further clarifying the functions and operations of 
BSIE. This material is to be incorporated in a supplemental report 
on certain aspects of the operations of the Government in medical 
and international health fields now being prepared by the subcom- 
mittee staff. 

The Office of Defense Research and Engineering, and other Federal 
agencies, are also discussing the establishment of the proposed in- 
formation exchange service in the physical sciences, now being estab- 
lished under the administrative control of the Smithsonian Institution. 
While it is conceded that the present service of BSIE in the field of 
biosciences may ultimately be improved through mechanization, it 
will unquestionably take some time before the system is perfected, 
and it is the view of the staff that its present responsibilities should 
not be extended into more areas until it has demonstrated its capability 
to perform satisfactorily in its present field, and adequate steps are 
taken to avoid unnecessary duplication. Questions have also been 
raised by industrial groups as to the desirability of extending this 
service into the physical science field when there are available private 
companies which it is reported have the systems and equipment nec- 
essary to undertake such a program on a proper basis and to start 
producing at an early date, and at an estimated cost well below BSIE 
costs, of approximately $430,000 annually. 

In regard to these allegations, the NSF commented that— 


We feel the report reflects a misunderstanding between 
executive or administrative direction and that of project 
operation. We believe that executive direction and program 
coordination of the biological and physical sciences informa- 
tion exchanges should remain within the Federal structure. 
This does not prejudice in any way choice of means, including 
private contractors, to operate the exchanges. 


ATOMIC ENERGY COMMISSION (AEC) 


The AEC has taken aggressive and effective action in the develop- 
ment of an information processing system. Its mechanization pro- 
gram adequately supplies interested Government agencies, industry, 
medical and other scientifie groups, universities and libraries, with 
scientific and technological data in the nuclear energy field. It also 
compiles and abstracts the world’s most comprehensive scientific in- 
formation collection in the field of nuclear science, and bibliographi- 
cally organizes, packages, and distributes for the use of all peoples this 
body of knowledge of the atom and its application for peaceful pur- 
poses. 

Officials of the AEC were very cooperative in supplying the staff 
with complete details of information processing procedures, which 








16 DOCUMENTATION OF SCIENTIFIC INFORMATION 


were most impressive. The machine operations are located at its 
Oak Ridge laboratories, however, and the staff did not:see the system 
in actual operation. A comprehensive summary of the activities of 
the AEC, as supplied to the staff, is set forth in part I of this report. 

The AEC has also set up a Committee on Information Systems in 
an effort to further its already advanced program. That Committee 
recently determined, however, that there are a number of questions 
that must be settled before it can proceed further with the task of 
finding the optimum machines system for handling the AEC technical 
information problems that still remain. The committee took the view 
that there are machines existing that have the capacity for handling 
the task under certain conditions, but that the kind of application to 
be made and the requirements to be set on the system as to the scope 
of coverage, speed of operation, depth of retrieval, ease of maintenance, 
and economy of operation will make a large difference in the choice 
of systems and machines. The Committee in its preliminary findings 
on February 18, 1960, which are also included in part I of this report, 
set forth a series of steps which should be taken before a determination 
can be made for a permanent expansion of its present program. 


CENTRAL INTELLIGENCE AGENCY (CIA) 


Officials of the CIA were most cooperative, and arranged an exten- 
sive briefing for the committee staff and an onsite survey of the various 
systems, methods, and machines which it utilizes for the storage, index- 
ing, abstracting, and retrieval of information anddata. The staff was 
much impressed with the advanced stage of the Agency’s automatic 
data-processing activities, and by the fact that the CLA has developed 
a number of comprehensive independent systems to meet its special 
needs, as well as machines to implement these systems. 

It is the judgment of the committee staff that the CIA and the 
AEC have made the most progress and achieved the greatest advance- 
ment of all Federal agencies in the field of information processing. 
From the material furnished during the briefing, and the demon- 
stration of the complex equipment, it is obvious that some of this 
country’s leading systems engineers, scientists, and information 
technicians have been consulted or have worked with the officials of 
CIA in developing what the staff believes to be the most comprelhen- 
sive information system now in operation, many aspects of which have 
been mechanized. 

In the course of this review of the agency’s activities in this area, 
the staff learned that CIA has been testing the Minicard system, whic! 
was especially developed by the Eastman Kodak under Air Force 
contract, for the “quality” of retrieval. The test has been long. 
extensive, and thorough, with major emphasis on the intellectual 
problems involved in coding, designing code systems, and the utiliza- 
tion of new machine capabilities never before available to a library 
function. The staff has been informed by the Eastman Kodak Co.. 
that the system did retrieve more relevant documents and not as many 
nonrelevant documents as the Intellofax system. Test results to date 
indicate that Minicard is perhaps better suited to special rather than 
general applications. 

However, the system and procedures developed for the quality 
evaluation test would be quite costly if utilized for regular full-scale 





DOCUMENTATION OF SCIENTIFIC INFORMATION 17 
production. Consequently, this area must be carefully studied and 
tested in order to devise an “operational system.’”’ (The supplier, 
Eastman Kodak Co., estimates that the cost of operation under a 
Minicard system will be less than that of the Intellofax system. ) 
Although the staff is not in a position, for security reasons, to 
explain in detail the type of equipment used, the methods and pro- 
cedures followed in the storage and retrieval of information by the 
CIA, the Agency was requested to prepare a brief outline of the 
principles, objectives, and progress made in mechanical information 
storage and retrieval, which will be found under part I of this report. 
One such project, WALNUT, which is being perfected by officials 
of CIA and IBM, involves research work undertaken on advanced 
techniques for microfilmed document image storage and a complete 
systems study. In the IBM final report on this project, covering a 
period of development from September 1958 through June 1959, a 
summary of progress is outlined, which sets forth a total of 11 con- 
clusions and recommendations regarding the operation of the program. 


LIBRARY OF CONGRESS 


The Library of Congress is one of three major U.S. Government 
libraries .engaged in the collection, storage, and dissemination of 
scientific and technological materials, documents, and other related 
information, the other two being the National Library of Medicine 
and the Department of Agriculture Library. There are several other 
agencies such as the Department of the Interior (mining, petroleum) 
and the National Bureau of Standards (mathematics and physics) 
which also maintain significant libraries. 

The Library has a Science and Technology Division which serves 
as a collecting and reference center for unclassified, technical report 
literature for the United States and foreign countries. The output 
of a number of Government agencies and their contractors, including 
the Atomic Energy Commission, the National Aeronautics and Space 
Administration, and agencies within the Department of Defense, 
forms a good part of this literature. In addition to its other activities, 
the Division has a research program in documentation techniques, the 
objective of which is to make the Library’s science collections more 
responsive to current trends and needs. 

As of June 30, 1958, the scientific collection of the Library had 
reached a total of 1,486,000 scientific and technological volumes, 
including monographs and serials. The staff has also been advised 
that the Library now has more than 18,000 exchange agreements with 
governments, private research centers, libraries, universities, and other 
scientific and technical institutions, and that these sources provide 
annually nearly one-half million books, pamphlets, journals, and 
other materials in printed, near-print, and photographic-copy form. 
Furthermore, the Library is now regularly exchanging publications 
with 151 scientific and technological institutions in the Soviet Union 
and an additional 154 in the satellite countries. All of these materials 
appear to be cataloged and indexed by means of the traditional card 
system. 

The staff has also been advised that the Library has introduced 
mechanical equipment for the conduct of many of its operations as a 
normal aspect of administration and management planning, and some 








18 DOCUMENTATION OF SCIENTIFIC INFORMATION 


attention is being given to the possibility of the mechanization of 
information retrieval processes. 

Since the completion of the hearings by this committee, 2 years 
ago, the Science and Technology Division inaugurated a research 
program in documentation techniques with the objective of making 
the Library’s science collections and the Division’s reference and 
bibliographical products as responsive as possible to current trends 
and needs. In this connection the staff has been informed that 
developments in documentation techniques are being studied with a 
view to determining which mechanisms and equipment available on 
the commercial market, or under development by industry and Gov- 
ernment, appear to be worthy of trial or other application within the 
Division, and in some instances, trials have been conducted on an 
experimental basis. 

he Library of Congress advises that it has, within this period, 
been studying mechanized information retrieval intensively as it 
might be applied to scientific and technical materials as well as to 
other informational materials in its custody. (The Library feels that 
storage and retrieval problems are basically the same, whatever the 
field.) The staff was advised by officials of the Library that, following 
consultations with representatives of industrial firms engaged in 
systems engineering and the manufacture of electronic equipment and 
surveys made within the Library by three such firms, they found 
that their card catalog system was the best method at present avail- 
able. At present this is largely a manual operation, but in publishing 
recent editions of the National Union Catalog some mechanical equip- 
ment has been utilized in the interest of greater economy and effi- 
ciency. According to Library officials, however, there is no mech- 
anized retrieval system now available that would provide scientists 
and others with the detailed data they would like to have. They 
pointed to several possibilities for data-processing approaches but 
stressed the need for further study of the Library’s operations in 
relation to mechanization, for additional systems work, for the 
development by the industry of more sophisticated machines, and, 
finally, for tests to prove economic feasibility. 

The staff has been advised that the Library is making plans for 
further studies which it is hoped will point the way to effective use 
of mechanical systems and procedures. 

The staff was informed by officials of the Department of Defense 
that the Library of Congress was not being fully utilized as a scientific 
and technological storage center by the Department of Defense 
because much of the Department’s material is classified, and that 
it is the Department’s position that a documentation storage center 
of this nature should be a part of the executive branch, and not 
under the jurisdiction of the legislative branch of the Government 
as is the Library of Congress. Further, it was pointed out that it is 
the-practice of the Department of Defense and other agencies to with- 
hold from the Library classified and certain other information which 
is now being withheld from committees of the Congress under execu- 
tive policy. This position is based upon the premise that such 
information should not be made generally available to any agency of 
the legislative branch, since it would then be available to committees 
of the Congress, and be inconsistent with Presidential policy. The 
executive branch of the Government has consistently held that 





e 
ic 
e 
ut 


»t 
it 


= 
sh 
u- 
ch 
of 
es 
he 


at 








DOCUMENTATION OF SCIENTIFIC INFORMATION 19 
officials thereof should not provide information to the legislative 
committees when it is considered to be in the national interest to 
deny such information, or when it is related to so-called internal 
affairs of the executive branch and not a legislative concern of the 
Congress. 

On the other hand, officials of the Library do not believe that any 
significant amount of literature is being withheld from it, as alleged to 
the committee’s staff by certain officials of the Department of Defense. 
The Library has been for a number of years, and is currently, prepar- 
ing technical bibliographies, many of which include classified materials 
and some of which are entirely classified. In their opinion the figures 
show that the Library is being utilized as an information center by the 
executive agencies, including the Department of Defense. 

This appears to raise the question of whether it might be desirable 
to transfer the Division of Science and Technology of the Library of 
Congress to an executive branch agency and, if so, to which agency? 
Obviously, it would not belong in the National Science Foundation, 
since that is not an operating agency, and its activities are directed 
primarily to the support of science and research through contracts or 
grants. Nor should the functions be transferred to the Department of 
Commerce, since a study of the operations of science functions already 
lodged in that Department developed the fact that the Office of 
Technical Services and the National Bureau of Standards which are 
performing services not directly associated with the primary mission 
or even related to other functions of the Department, have not 
received the technical administrative direction necessary to insure 
their highest potential. 

Should the Department of Commerce fail to assume its responsi- 
bilities in the field of science, through the agencies already under its 
administrative control, it would become increasingly clear that there is 
a definite need for a Commission on Science and Technology, as pro- 
posed by this committee in reporting the bill S. 1851, providing for the 
establishment of such a commission, with authority to determine 
whether or not a Department of Science and Technology is desirable, 
or whether some other agency should be created to provide more 
effective operations of civilian science programs and for better coordi- 
nation of science operations. If such a department was considered as 
being necessary, the proposed Commission would be further authorized 
to recommend which Federal agencies and functions should be placed 
under its jurisdiction. In the latter case, it is the view of the staff 
that it would be logical and almost imperative that a library of science 
and technology should be included within the new departmental 
structure. 

The Library of Congress, however, is opposed to the transfer of its 
Science and Technology Division to an executive branch agency. It 
also feels that its collections in science and technology belong where 
they are—in the Library, where they can be, and are, used in con- 
junction with materials in other fields, such as education and eco- 
nomics, which vitally affect scientific and technological development. 

Some idea of the magnitude of the information-center services 
performed by the Library for the executive branch may be gained 
from the following: During fiscal year 1960, the Science and Tech- 
nology Division alone will provide 6,600 abstracts of scientific and 
technical literature to DOD agencies and 5,700 abstracts to non-DOD 





20 DOCUMENTATION OF SCIENTIFIC INFORMATION 


agencies; for the production of these abstracts from literature appear- 
ing originally in 1 of some 25 different languages, the executive 
agencies will transfer to the Library approximately $300,000. In the 
last 2 years more than 85,000 abstracts of information in foreign scien- 
tific and technical literature have been produced by other divisions 
of the Library for the Air Force alone, and virtually all this information 
is channeled to the American scientific community through OTS. 
For this and other bibliographic work, the Air Force will transfer to 
the Library some $3,750,000 in fiscal. year 1960. 


DEPARTMENT OF COMMERCE 


In conducting the staff studies, in 1957, for the development of a 
reorganization program of science activities and operations within the 
Federal Government, difficulties were encountered in obtaining infor- 
mation relative to the science operations of the Department of Com- 
merce. The staff concluded at that time that there was a lack of 
complete understanding of the importance of the science functions of 
the Department; that the major emphasis was being placed upon what 
the then Secretary considered to be more important departmental 
operations in line with the primary mission of the Department. The 
staff recommended, therefore, that certain of these service operations 
should be transferred to the proposed new Department of Science and 
Technology. 

Had the attitude of the Department head which prevailed in 1957 
continued, it is doubtful whether any proposal to transfer the science 
and technology library to the Department of Commerce would have 
afforded the necessary coordination and improvement of these services, 
or the essential administrative guidance to insure its operation at the 
highest level of efficiency. The staff was advised, however, on May 9, 
1960, by the present Under Secretary of Commerce, Mr. Philip A. Ray, 
acting in behalf of the present Secretary, that this situation no longer 
exists, and that special emphasis is now being placed upon improve- 
ment. of the science functions vested in the Department. The follow- 
ing is quoted from Mr. Ray’s letter to the committee: 


Secretary Mueller and the other officials of this Depart- 
ment consider these agencies and the work they are doing to 
be integral and important functions of the Department of 
Commerce. We are most directly concerned with the de- 
velopment and operations of all of the agencies of the Depart- 
ment; and we consider the scientific activities, in particular, 
to be vital and important operations directly related to the 
purposes for which this Department was created. 

It was this realization and concern which prompted Secre- 
tary of Commerce Sinclair Weeks to request the National 
Academy of Sciences to undertake an evaluation of the oper- 
ations of the Department of Commerce to insure that we are 
fulfilling our responsibilities in the interest of scientific and 
technological progress. Secretary Mueller, shortly after his 
appointment, expressed his own desire that the Committee 
appointed by the President of the National Academy of 
Sciences, complete the task they had undertaken at the 
request of his predecessor. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 





The Committee recently submitted to the Secretary its 
report entitled, ‘The Role of the Department of Commerce 
in Science and Technology.” We were gratified to note the 
many instances in which the Committee praised the scope 
and quality of the scientific work being carried on within the 
Department of Commerce. Many of the recommendations 
for improvement made by the Committee, particularly some 
of those calling for increased budgetary support, have already 
been implemented. Other recommendations are under active 
consideration with a view toward their implementation wher- 
ever practicable. As Secretary Mueller said in releasing the 
report: “The joint objective of the Committee and the 
Department is to continue to find opportunities to gear our 
operations to the rapidly developing technological revolution 
and thereby to better serve the scientific community, private 
industry, and the general public.” 


U.S. Patent Office 


The Office of Research and Development, U.S. Patent Office, 
Department of Commerce, is in the process of developing a system 
for the mechanical storage and retrieval of patents and patent in- 
formation. The staff witnessed a demonstration of the system and 
machines being tested to screen patents and to retrieve appropriate 
technical patent information by means of electronic data-processing 
systems. The staff was further advised that the present limited use 
of these systems has already indicated that they would produce greatly 
increased efficiency in the operations of the Patent ce, and that 
full implementation will develop some ultimate economy. 

A special program relating to the retrieval of patent mformation in 
the field of chemistry is being developed through the active coopera- 
tion of International Business Machines Corp., the National Bureau 
of Standards, and other industrial consultants. When a satisfactory 
system and automation equipment is devised in this field, it is then 
proposed to extend the advancements made therein in relation to the 
utilization of patent information and patent examining procedures 
to other fields of science and to the general processing of patent in- 
formation required by its examiners and by industry. 


National Bureau of Standards (NBS) 


Some of the most important research and development work being 
done in Government is being carried on by the National Bureau of 
Standards. 

In its initial studies of the operations and problems of the NBS, 
the staff was much impressed with services which the Bureau was 

ualified and prepared to render in the development of basic sciences. 

here can be no doubt that this agency can become one of the most 
important within the Federal science structure, if afforded the nec- 
essary administrative support and if it is provided with adequate 
funds to enable it to contribute services in the science field to its full 
potential. 

A brief summary of some of the services the NBS is prepared to 
render in all areas of Federal science operations is included under 
part I of this report, one of the most important of which is in the 





i 
22 DOCUMENTATION OF SCIENTIFIC INFORMATION 





evaluation and development of scientific information retrieval pro- 
grams. 


Data processing operations 


Some of the earliest and most effective utilization of retrieval sys- 
tems and electronic machines or equipment may be found in connec- 
tion with the scientific and technological data processing and retrieval 
requirements of agencies of the Department of Commerce, such as 
the Weather Bureau, Bureau of the Census, Office of Technical 
Services, Coast and Geodetic Survey, and Bureau of Public Roads, 
as well as the Patent Office. 

These agencies have prepared statements and submitted published 
materials outlining the work they have done and are doing in this 
important area. The statements referred to are included under part 
I of this report and the supporting material retained in the committee 
files. 

NATIONAL LIBRARY OF MEDICINE 


The staff study developed that the National Library of Medicine 
has more than | million volumes on medical and related scientific 
subjects. In 1958, the library received and cataloged more than 
30,000 books and 110,000 medical articles. It publishes and distrib- 
utes an annual catalog of books, a monthly index to medical literature 
under the title ‘Index Medicus,” and current bibliographies on medical 
subjects, and supplies information on specific medical inquiries in 
response to requests received from all parts of the world. Although 
the National Library of Medicine relies upon manual processes for 
abstracting its material, its assembly, storage, indexing, and retrieval 
(or photographing) systems are partially mechanized, resulting in 
both speed and accuracy of operation. This system is essentially a 
mechanized medical index and retrieval process, which, in the view 
of practically all qualified experts consulted by the staff, is highly 
efficient and adequate to meet the present needs of the library. 

Dr. Frank B. Rogers, Director of t the National Library of Medicine, 
submitted to the staff a copy of the first issue (May 1960) of Index 
Medicus, which is the product of the mechanical system which was 
demonstrated to the staff. Dr. Rogers commented that— 


while there is room for improvement, which we are prosecut- 
ing, I believe you will agree that the production of such a mas- 
sive index in convenient format, appearing promptly, is a real 
step forward. 





NATIONAL LIBRARY OF AGRICULTURE (USDA) 


The Department of Agriculture Library was established in 1862 
and by specific statute was authorized to acquire and preserve infor- | 
mation concerning agriculture which may be obtained from books, 
correspondence, and periodicals for the purpose of assisting in carrying 
out the activities of the Department. From the beginning a Division 
of Chemistry was established as a research unit in the Department, 
because chemistry was the underlying science for much of agricultural 
research. The Department has assembled one of the world’s most 
complete collections of books, reports, and journals in general and : 
agricultural chemistry. This ‘outstanding co hetien i is strong in the 
fields of botany, zoology, agricultural bacteriology, entomology, 
forestry, soils, agricultural engineering, veterinary medicine, and 
plant pathology. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 23 


The library exchanges USDA publications with libraries and other 
institutions throughout the world. Over 500 titles of journals and 
scientific reports are received on exchange from the U.S.S.R. In addi- 
tion to acquiring these periodicals the Devartmert indexes, catalogs, 
and makes such indexes available to the scientific public. The library 
also publishes a monthly periodical entitled “The Bibliography of 
Agriculture.’ Evaluation of this publication has been made by 
various experts in documentation. Dean J. Shera, of Western Reserve 
University, and Prof. M. Egan, stated: 


Some of the best comprehensive subject bibliographies in 

this country have been produced by special subject Piocrine 

of the Federal Government, such as the Department of 

4 riculture Library and the Library of the Surgeon General’s 
ice. 


The U.S. Department of Agriculture Library is used by individual 
scientists and major research installations, both in this country and 
abroad. In addition to the regularly issued bulletin, the Department 
library also publishes special bibliographies on subjects requested by 
Department research workers. It is reported that over 20,000 current 
journals and serials are received by the Department of Agriculture 
Library and a published list of these titles is also available. 

The library has pioneered in extending its services through photo- 
copying of publications which began in 1934 in cooperation with the 
American Documentation Institute. From 1946 to 1956 the U.S. 
Department of Agriculture Library had a joint arrangement with the 
American Chemical Society to provide chemists with photocopies of 
all articles listed in Chemical Abstracts. It was reported to the staff 
that the USDA Library works very closely and cooperates with the 
Library of Congress, the National Library of Medicine, and other 
leading libraries and institutions for the acquisition of information as 
well as for the dissemination of various types of information on a 
myriad of subjects. 

The services performed by the library have progressed considerably 
during recent years, but the staff has been informed that there is a 
need for further improvements, as advocated by the Director, in 
order that the library may adopt more modern systems and machines 
in perfecting its operations, particularly in retrieving and disseminat- 
ing information to which access is now limited by methods which are 
outmoded and in need of further study and development. More 
detailed information on the type of research and development made to 
date will be found under part I of this report. 

A problem of the library is the lack of adequate funds with which 
to fully implement an information retrieval system. In concludin 
his presentation of the operations and requirements of the researc 
programs of the Department of Agriculture, before the House Com- 
mittee on Appropriations in the present Congress, Dr. Byron T. 
Shaw, Administrator of the Agricultural Research Service, pointed 
up the need for improved library service, as follows: 


Next to scientists themselves, perhaps the most important 
prerequisite of successful and productive research is avail- 
ability of scientific literature. Many people think of 
scientific publications as products of research. But they 
are also one of the scientists’ most important tools. I 











DOCUMENTATION OF SCIENTIFIC INFORMATION 


would therefore like to mention briefly a proposed budget 
item that is outside the ARS budget but vitally important 
to our work. I am referring to the increase of $56,880 
contained in the proposed budget of the Department of 
Agriculture Library. 

During the past 10 years, the library has been increasingly 
hard pressed to give adequate service to our scientifie per- 
sonnel. As the Department’s research program has ex- 
panded, the number of scientists requiring library service 
has increased also. At the same time, the world output of 
scientific publications has doubled. This means a vastly 
increased job of abstracting, indexing, and translating in 
order to make publications available for use by our scientists. 
More than 200,000 separate publications are received in the 
library’s exchange program each year, and many of these 
have had to be boxed and stored for lack of staff to process 
them. 

Prices of basic scientific reference books have increased 
100 to 600 percent during the past 10 years, and the library 
has been unable to acquire many revised editions and new 
compilations badly needed by Department scientists. 

I want to state that’ we continually receive wonderful 
cooperation from the library staff. They work long and 
hard to provide our scientists with as much help as they 
possibly can. The fact remains, however, that the work 
of ARS would benefit considerably from the additional 
services that could be provided by the library with the 
increased funds requested. 


NATIONAL SCIENCE FOUNDATION (NSF) 


The National Science Foundation is authorized by law to maintain 
an active program of fostering cooperation and coordination of scien- 
tific information activities of Federal agencies and nongovernmental 
organizations. An Office of Science Information Service has been 
established within the National Science Foundation, in accordance 
with authority provided by title [IX of the Defense Education Act of 
1958. This Office has documentation research as its prime objective, 
with special emphasis on the development of systems to meet scien- 
tists’ information needs. The Foundation has submitted an extensive 
report on its activities in this field, reproduced under part I of this 
report, which sets forth the fact that it is presently supporting a 
number of studies including (1) the information requirements of 
scientists; (2) information storage and retrieval systems; and (3) 
mechanical translations. 

The information submitted to the committee by the National 
Science Foundation is based in part upon repeated statements made 
to the staff that there was a serious lack of coordination and direction 
by the National Science Foundation on whom reliance has been placed 
for leadership in all civilian science fields. These allegations have 
been based upon the practice of the NSF to support the development 
of uncoordinated projects of particular agencies or segments of indus- 
try without a proper determination having been made as to our 
national requirements, or the initiation of a coordinated study and 





mee CS. Or] Us”, FL 


DOCUMENTATION OF SCIENTIFIC INFORMATION 25 


evaluation of the systems that would be needed to meet these require- 
ments. Deficieneies which may exist in the programs of the NSF 
(i.e., lack of coordination of Federal and nongovernmental activities) 
are probably largely due to the limitations placed upon its activities 
by statutes which do not permit it to exercise operating functions. 

Dr. Alan T. Waterman, Director of the NSF, commented on this 
staff observation, as follows: 


In this connection, the Foundation, in carrying out all of 
its programs, obtains the advice and assistance of leaders in 
the various scientific fields in order to assure that national 
needs are met. 

While section 15(c) of the Foundation Act states that the 
Foundation shall not, itself, operate any laboratories or pilot 
plants, this restriction has not in any way interfered with the 
Foundation’s assessment of national needs and its support of 
those scientific areas needing particular assistance. In the 
field of scientific information, in addition to being authorized 
and directed to foster the interchange of scientific informa- 
tion among scientists in the United States and foreign coun- 
tries by section 3(a)(5), the Foundation is specifically author- 
ized, in section 11(g): ‘‘to publish or arrange for the publica- 
tion of scientific and technical information so as to further 
the full dissemination of information of scientific value con- 
sistent with the national interest. * * *” 

Furthermore, title [X of the National Defense Education 
Act of 1958 states that the Foundation, through a Science 
Information Service, shall: 

“(1) provide, or arrange for the provision of, indexing, 
abstracting, translating, and other services leading to a more 
effective dissemination of scientific information, and (2) 
undertake programs to develop new or improved methods, 
including mechanized systems, for making scientific informa- 
tion available.” 

It is, therefore, clear that the National Science Foundation 
is not statutorily barred from engaging in direct activities of 
an operating nature in the scientific information field. While 
the Baandation is not legally barred from such activities, it 
is generally proceeding on the assumption that more can be 
gained by close cooperation with, and in support of, existing 
scientific information services in the United States, both 
public and private, where they are functioning effectively, 
than by direct Federal operation of such services. The 
scientific information services rendered by many of the 
scientific societies and professional institutions to the scien- 
tific community are world famous for their quality. We 
believe it is essential that the Federal Government continue 
to cooperate with, and assist, such private groups in the 
achievement of long-range solutions to scienfifie eet 
problems, 


At the request of the Subcommittee on Reorganization, following 
hearings held in April and May 1959, the National Science Foundation 
made a tentative evaluation of the program being advanced by West- 
ern Reserve University, which had been brought to the attention of 








26 DOCUMENTATION OF SCIENTIFIC INFORMATION 





the subcommittee, and awarded a contract to the University to further 

evaluate its mechanized systems as applied to the field of metallurgy, 

which is being carried on in cooperation with the American Society for 

Metals. Similar contracts have been awarded to the American 

Chemical Society and Chemical Abstracts in the field of chemistry. 
In commenting on the contraet with WRU, the, NSF stated: 


Members of the OSIS staff and the WRU staff had known 
each other personally, and had been in professional contact 
for some years prior to the 1959 hearings mentioned in the 
paragraph quoted above. The WRU staff had previously 
submitted proposals to, and had received grants from, the 
National Science Foundation. The present WRU activity 
being supported by the Foundation is resultant from a WRU 
proposal not the same as that presented to the committee 
mentioned. 

OTHER FEDERAL AGENCIES 


In a report to the committee relative to reorganization effected 
during the calendar year 1959, certain of the departments and agencies 
reported advancements made in new electronics procedures or data 
processing, as follows: 


Department of Agriculture.—Reported that— 


A Data Processing Division was established to provide cen- 
tralized data service within the Agricultural Research Service. 
Data processing equipment and associated personnel pres- 
ently dispersed organizationally and physically are being 
brought together as rapidly as is feasible. 


Department of the Treasury—The Acting Secretary reported: 


The reduction in employment in the Bureau of the Public 
Debt is directly attributable to the adoption of the punched- 
card form of savings bonds which has made possible the 
adoption of machine processes in lieu of former*manual meth- 
ods. The new form of savings bonds was introduced in 
October of 1957. Since that date a total reduction in per- 
sonnel of over 500 employees can be attributed to the new 
procedures. The Treasury Department office in Chicago is 
the main beneficiary of the new electronic data processing 
system. This office is able to show a reduction of 179 em- 
ployees due to this change. The Parkersburg office accounts 
for the major part of the remaining reduction. The decrease 
in employment in that office is due to refinements and im- 
provements in the new electronics procedures. 

Although the number of checks paid by the Office of the 
Treasurer in 1959—398 million—did not diminish from the 
previous year, there was a net decrease of 19 employees in 
the Check Payment and Reconciliation Division. This 
accomplishment was made possible by a contimuing appraisal 
of procedures followed, better utilization of personnel, and 
more efficient programing of the electronic data processing 
equipment. 





ee ee eee 





} 
' 
' 


DOCUMENTATION OF SCIENTIFIC INFORMATION 27 


The Department also reported that a reduction of 859 employees in 
the Internal Revenue Service during 1959 was due, in part, to— 


shifting tax returns processing work from the districts where 
it was done manually to a mechanized operation in the service 
centers— 


and that, subject to the availability of funds— 


plans are being completed for the establishment during 1960 
of a pilot regional service center under IRS’ long-range auto- 
matic data-processing system. 


Department of Health, Education, and Welfare—The Secretary of 
Health, Education, and Welfare reported to the committee that a cen- 
tral planning staff was established in the Office of the Director to study 
the entire claims process in the Bureau of Old-Age and Survivors 
Insurance— 


with particular attention to the application of electronic data 
processing, and to put into operation such improvements as 
the study results indicate are feasible. 


Federal Communications Commission.—The chairman reported that 
an Automatic Data Processing Study Group, working with the staff 
of the National Bureau of Standards to investigate the feasibility of 
automating Commission processes, was established during 1959. 

Interstate Commerce Commission.—The Commission reported. that, 
during 1959— 


in connection with efforts to mechanize and improve various 
operations to the fullest extent possible, the payroll and sav- 
ings bond processes were converted entirely to electronic 
computing machine operation. This conversion from a 
manual system resulted in a reduction of seven payroll clerk 
positions. Records destroyed or transferred to Federal Rec- 
ords Centers totaled 10,472 cubic feet, saving approximately 
$55,400 in replacement value of equipment and value of 
space. 


Railroad Retirement Board.—The Board informed the committee 
that part of the decreases in personnel of 109 employees during 1959 
can also be attributed to adjustments which have been made in 


reparation for the installation.of an electronic data processing system 
in 1960. 


NATIONAL ACADEMY OF ScIENCES—NAaATIONAL ResearcH CouNcIL 


The Academy-Research Council established a new Office of Docu- 
mentation during the summer of 1959. This action was the result of 
long planning and consideration and was in answer to a need repeat- 
edl axes! by groups in and out of the Academy-Research Council. 

he Director of the Office is Dr. Karl F. Heumann, formerly Re- 
search Director, Chemical Abstracts Service. Dr. Heumann was 
also Director of the Chemical-Biological Coordination Center of the 
Academy-Research Council from 1952 to 1955. His training is in 


chemistry and he has had a long interest in problems of scientific 
information. 











28 DOCUMENTATION OF SCIENTIFIC INFORMATION 


In response to a request from the staff, Dr. Heumann submitted 
on April 4, 1960, the following comments relative to the operations 
of the Office of Documentation: 


The scientific literature has grown at an unprecedented 
rate in recent years, and many scientists have felt that the 
difficulties of searching and using information have similarly 
increased, even to the point where bibliographic control 
was effectively lost. As a response to this situation, new 
methods and techniques have been proposed for dealing with 
this ever-growing collection of information. New classifica- 
tion and indexing schemes, new electronic equipment, new 
methods of storing information on film or magnetic tape— 
all these have been developed in a discipline loosely covered 
by the term “documentation.” 

The Office of Documentation has three major areas of in- 
terest: (1) Advice to the National Science Foundation and 
others as appropriate in broad problems of scientific docu- 
mentation, including the recording, storage, retrieval, and 
dissemination of information to serve the needs of science; 
(2) provision of a mechanism for the participation of U.S. 
scientists and documentalists in international activities re- 
lating to scientific documentation; and (3) advice and assist- 
ance to the several activities of the Academy—Research 
Council in the documentation problems that they encounter 
from time to time. Close liaison is being maintained with 
the National Science Foundation’s Office of Science Informa- 
tion Service and with other interested groups. 

An advisory committee, made up of scientists and docu- 
mentalists, whose chairman is Dr. Elmer Hutchisson, director 
of the American Institute of Physics, has been appointed to 
serve the Office. 

In furthering the participation of U.S. scientists and docu- 
mentalists in international documentation activities, the 
Office has taken the lead in organizing a U.S. National Com- 
mittee for the Federation Internationale de Documentation 
(FID). In the near future, the Academy—Research Coun- 
cil will become the U.S. adhering body to the FID, and at 
that time the U.S. National Committee will be formally 
constituted. The aim of this National Committee, made up 
of representatives from national societies and Government 
agencies, will be to advise the Academy—Research Council 


on its role in FID programs and other international docu- 
mentation affairs. 


CooPERATION OF INDUSTRY 


In addition to the Federal agencies contacted in the appraisal of 
the science information retrieval programs, the assistance of certain 
industries and institutions which the staff had been informed have 
made notable progress in this field was enlisted. Those selected 
were reported to have advanced programs for the development of 
tabulating, mdexing, and mechanized science retrieval systems and 
had made progress in perfecting automatic retrieval equipment. The 
major objective was to obtain representative samplings of information 


a AES 2 SRS Sm SOR re Ke + 





of 
in 
re 
» 

of 
1d 
ne 
on 


od 


pe Co a Ne NN 


DOCUMENTATION OF SCIENTIFIC INFORMATION 29 


on programs being operated by private industry which may have 
potential for utilization by Government agencies in correcting de- 
ficiencies in existing methods for assembling, translating, and dis- 
tributing available science information. The objective is to provide 
appropriate Federal agencies with information which will enable them 
to evaluate systems and methods now in use or under development by 
industry, in the field of science documentation, and, when found to 
be warranted in the public interest, to encourage and support further 
development of appropriate programs through contracts to those 
groups best qualified in this field, in order to establish centralized 
science information centers in areas essential to the operations of the 
Government. 

The staff, following studies of the problems involved, has con- 
cluded that a Federal center of documentation, originally proposed 
in the 85th Congress, is not feasible at this time in view of opposition 
of scientists to such a centralization of science information activities 
under Federal jurisdiction and control. The approach to the prob- 
lem is now being made on a selective basis covering specialized fields 
of science, such as in chemistry and metals. Eventually, however, 
the Federal Government may be compelled to establish a center for 
information exchange in order to bring all areas of science into a central 
clearinghouse, with authority to provide services to Federal agencies 
or industries operating in several areas of science as to where required 
information is being stored and may be available, and to determine 
whether or not other special centers should be established to service 
specific science programs. 

The staff has endeavored to set forth in this report some of the 
requirements of the various agencies of the Government engaged in 
scientific activities so that competitive contracts may be offered by 
these agencies, or through the National Science Foundation, to quali- 
fied industrial groups to assist the Government in perfecting these 
types of programs. 

These industries, selected as the most advanced in automation and 
representative of industry as a whole, have expended considerable 
time and funds in analyzing, testing, and developing information and 
data retrieval systems; some of them directed toward serving the needs 
of other industries and the Federal Government, and others directed 
primarily toward meeting their own needs. The staff received excel- 
lent cooperation from these industries and institutions. A number of 
them sent representatives to Washington to confer with staff members, 
while others furnished brief summaries of their programs and recom- 
mendations for committee consideration, which are included in part I 
of this report. 

The group included in this report, which has initiated programs for 
their own utilization, includes the E. I. du Pont de Nemours Corp., 
the Esso Research & Engineering Co., the Bell Telephone Labora- 
tories, and the Smith Kline & French Laboratories. These industries 
have taken the lead in developing information retrieval processes in 
their respective fields of chemistry, petroleum, communications, and 
pharmaceuticals. Their relatively large operations are directed 
primarily toward solving problems relating to the procurement of 
scientific information required i in their specialized fields, and makin 
available scientific and technological data and information from all 


54122—60 3 














30 DOCUMENTATION OF SCIENTIFIC INFORMATION 


sources, as well as information developed within their own affiliates, 
to scientists and researchers in their own laboratories. 

The developers of machine tabulating systems, and manufacturers 
of mechanized equipment, such as Documentation Incorporated, Itek 
Corp., Western Reserve University, General Electric Co., Interna- 
tional Business Machines Corp., Remington Rand Univac, Eastman 
Kodak Co. (Recordak Corp.), and Jonker Business Machines, Inc., 
were also most cooperative in providing information to the staff. 

There was general consensus of opinion among representatives of 
these industries, and others contacted, that there was an urgent need 
for the improvement of automation in industry and Government. 
All participating groups expressed a desire to cooperate in correcting 
the existing deficiencies through the exchange of information and by 
assisting in improving processes for the retrieval of scientific informa- 
tion and research data required to keep this Nation in a competitive 
position in technological development throughout the world. 

With certain exceptions, there was general agreement among those 
officials representing industry and government that the basic 
problem related, not only to the development of new automation 
and mechanized equipment; but also perhaps more precisely 
to develop more effective means of describing the information 
contained in a given document—and of describing the _ infor- 
mation desired by an inquirer—all so that these two descriptions 
may be capable of being matched and pertinent information obtained 
by the inquirer. This would require, among other things, formu- 
lating engineering systems and special codes or thesauri for better 
and more rapid utilization of such equipment. A major problem 
is to develop the needs and requirements of any given operation, 
governmental or private, and the systems engineers and manufac- 
turers of mechanized equipment will then be in a position to provide 
the necessary automation to achieve the objective sought, within the 
limits of coding and input abilities which are made available. 

In this respect, the study developed, in summary, the fact that 
practically all of the machine tabulation equipment already available, 
or being developed by the industries engaged in the field of information 
retrieval, offers some potential for utilization in producing the required 
results, once the systems engineering process is selected and com- 
pleted. The problem appears to be which of these programs being 
advanced, balling for the use of one or more tabulating or sorting 
machines, or other automation systems which are now available, would 
be the most productive and would provide the required data or 
information with the least possible delay, within reasonable cost 
limitations. 

Based upon information developed by the staff, the Du Pont Co. 
appears to have done more research and appraisal of systems and 
machine retrieval equipment in use or being developed than any other 
company in effecting improvements in its own system. The company 
retains (or has reviewed in detail the work of) a number of such firms 
or institutions, with competing systems, as consultants. By process of 
trial and elimination, some features of several such systems have been 
accepted and various mechanical devices utilized in its program, while 
others have been rejected. An outline of the approach to the prob- 
lems involved is set forth in part II of this report. 


t 
j 





DOCUMENTATION OF SCIENTIFIC INFORMATION 31 


Some of these industries pointed up other problems which must be 
solved if there is to be the necessary teamwork between Federal 
agencies operating in the scientific or research fields and industries 
engaged in similar activities which are qualified and willing to co- 
operate. One of the major obstacles presently encountered by large 
industries well advanced in the field of research and development, but 
which do not have or seek Federal contracts, is the withholding of 
information or data generated through Federal agencies, particularly 
the military, under a regulation which requires that they must prove 
a “need to know.” The inevitable conclusion is that the Federal 
Government is not, under this policy, interested in assisting American 
industry as a whole in solving their own problems to improve their 
services to the economy of the Nation; that, if they are not actively 
engaged in performing some direct service to the Federal Government 
or carrying on some specific activity which contributes directly to an 
agency to which a request for information is directed, they are not 
entitled to the required information which is supposedly being gen- 
erated in the public interest and at public expense. 














Part I 
ACTIVITIES AND PROGRAMS OF FEDERAL AGENCIES 


The following details of operations were prepared by or under the 
direction of the officials in charge of the science information processing 
functions of the respective agencies indicated. Staff comments are 
also included under the section dealing with the program of the 
Library of Congress. 


ARMED SERVICES TECHNICAL INFORMATION AGENCY 


The Commander and Director of ASTIA, Col. Woodrow W. 
Dunlop, USAF, arranged for a briefing of the staff, at which infor- 
mation and research specialists from the Army, Navy, and Air Force 
were present. The purpose of the briefing was to prov ide information 
upon which the staff could make an evaluation of scientific infor- 
mation and data processing systems operated by the major agencies 
within the Department of Defense. 

- a preliminary report on “Automation of ASTIA’’, dated Decem- 
ber 1, 1959, a description of the transition in ASTIA from a manual 
to an automated operation was outlined. In this report, it was 
stated that- 


ASTTA is at the crossroads; the point at which the phase-in 
of the new overlaps the phaseout of the old. Automatic 
data processing equipment is being installed and programing 
under the new system is well along the way. Schedules are 
being met. The long-used manual operations are beginning 
to step aside and ASTIA will soon be in operational readiness 
for automation. 

The task of transition is an enormous one. It involves 
one of the largest collections of scientific and technical 
reports in the free world; reports that reflect the results of 
the major part of the U.S. ‘Government research and dev elop- 
ment program during and since World War II. The task is 
complicated by the continuing receipt of new reports from 
current programs arriving at the annual rate of about 30,000 
titles, and by receipt of almost 2,000 separate requests for 
reports from our holdings every working day. Add to this 
the fact that there was little precedence for such a venture, 
no proven system to follow, almost every step a new one to 
be studied and carefully planned, and the enormity of the 
task becomes appar ent. 

However, a spirit of cooperation and determination has 
prevailed among the people of ASTIA to the extent that not 
only are the automation processes being effected, but the 
current workloads are being met. In fact, large backlogs 
that once plagued ASTIA are actually melting away. 


33 








34 DOCUMENTATION OF SCIENTIFIC INFORMATION 





It is the long-range objective to eventually, through auto- 
mation, provide a comprehensive bibliographic information 
and announcement service in such a manner that the scientist : 
or engineer can have at his fingertips, at any given time, : 
information in the ASTIA collections pertinent to his needs. ) 
Also envisioned is the automation of essentially every step | 
in filling a request for a report, including its automatic re- ! 
production. This report relates the beginning of ASTIA’s : 
effort in these directions. ) 


Following the briefing at ASTIA headquarters, Arlington Hall 
Station, Colonel Dunlop submitted, on February 5, 1960, the following | 
additional information relative to progress being made in implement- 
ing the new program: 


As mentioned in the preliminary report, ASTIA’s elec- 
tronic data processing system will be implemented in three 
stages. The first stage employing puncheards will be put 
into operation on the 15th of this month. This first stage 
will provide ASTIA the following capabilities: 

1. Automatic request validation to determine the need- 
to-know and security clearance of the nonmilitary requester. 

2. Automatic inventory control which will enable us to 
improve our inventory Management. 

3. Complete and accurate accountability of both classified 
and unclassified documents. 

4. Mechanized quarterly and annually cumulative subject 
and corporate author indexes to our Technical Abstract 
Bulletins that are published twice monthly to announce 
acquisition of new documents. 

The second stage of ASTIA’s automation incorporating 
magnetic tapes will be placed into operation on July 1, 1960, 
and will provide the following capabilities: 

1. Automatic check for ncoming documents to determine 
if they have been previously processed into the system. 

2. Automatic identification of documents to fill requests 
that do not cite ASTIA catalog numbers. 

3. Mechanized search and retrieval of information—the 
identification of documents and catalog number in the ASTIA 


system on any subject or combination of subjects. 

In the third stage to be implemented in October of this | 
year, a random access capability will be added which will ) 
greatly reduce the processing times under the card and tape 


operations. At that time ASTIA will have a much greater 
machine capacity for bibliography and reference work. 
ASTTA is proceeding with plans to expand the third stage into 
a fully integrated data processing system. The primary ob- 
jective to be realized from such a system will be the ability 
to automatically print out all cataloging and abstracting 
information which will provide a truly automatic biblio- 
graphic service. Attached are two examples of machine 
bibliography formats. The first example cites ASTIA 
document numbers only and must be supplemented by con- 
ventional catalog cards. This form of bibliography will 
be compiled under the second stage of ASTIA’s automation 





ae 


DOCUMENTATION OF SCIENTIFIC INFORMATION 35 


to become operational July 1, 1960. ‘The second example is 
a bibliography of the type that ASTIA will be able to provide 
under the tape configuration as soon as the bibliography 
data are converted into machinable form. Classified 
portions have been deleted to obviate the need for imposing 
security classification on the example. 

Although our preliminary report is dated December 1, 
1959, significant progress has been made since that time. 
The ‘Remington Rand Solid State 90 computer has been 
installed at ASTIA. Personnel are on board and have been 
trained. The primary operational programs have been 
tested and debugged. All phases of the automation program 
are progressing according to plan and will become operational 
on the 15th of this month in accordance with the original 
schedule. ASTIA is quite proud of this achievement. 
Instances are exceptionally rare where automation on such 
a scale has been placed into operation in accordance with 
its originally established schedule. 

In the information retrieval area, the final edit of a 
“Thesaurus of Retrieval Terms” is underway. This is the 
“Bible” for the ASTIA retrieval system. It will be pub- 
lished and distributed in April to the Department of eae) 
research and development activities and their civilian 
contractors. 

Significant inhouse progress has been made in the assign- 
ment of retrieval terms to the ASTIA collection of docu- 
ments, and in the conversion of these terms into machinable 
form through a contractor. 

Progress has also been made in the manual operation in 
preparation for automation. Backlogs of documents to be 
processed into the ASTIA system lees been practically 
eliminated. New documents are moving into the processing 
pipeline as they are received and important documents are 
normally announced within 60 working days after receipt 
in the ASTIA Technical Abstract Bulletin. 


On May 2, 1960, Colonel Dunlop forwarded to the committee an 
advance copy of the ‘““Thesaurus of ASTIA Descriptors,” first (May) 
edition, commenting that: 


All of the terms listed in this thesaurus have appeared in 
the reports literature accumulated over the years by ASTIA. 
Hence, the vocabulary is authoritative in nature and 
represents one of the first products of our automation 
program. All terms are now in machine language, permitting 
computer preparation of the thesaurus. 


Atomic ENerGy CoMMISSION 


The Assistant Director for Technical Information Service, Division 
of Information Services, of the AEC, Mr. Melvin S. Day, and his 
scientific adviser, Mr. Van A. Wente, were most cooperative with the 
committee staff in outlining the science information processing, 
abstracting, and bibliographical services being rendered by that 
agency. At the request of the staff, the following detailed summary 








36 DOCUMENTATION OF SCIENTIFIC INFORMATION 





of the documentation and information retrieval programs now oper- 
ating in AEC was submitted for inclusion in this report: 


The effective dissemination of technical information within 
the Government atomic energy projects and among science 
and industry generally, has been a prime concern of the 
Atomic Energy Commission since its establishment. As 
directed by the Atomic Energy Acts of 1946 and 1954 the 
AEC 


(1) Records and reproduees as rapidly as possible the 
scientific and technological data developed in its own 
research and technical programs; 

(2) Maintains acquisition and exchange programs 
with nuclear research centers throughout the world; 

(3) Compiles the world’s most comprehensive scien- 
tific information collection in the field of nuclear science; 

(4) Bibliographically organizes, packages, and dis- 
tributes for the use of all peoples this body of knowledge 
of the atom and its application to peaceful purposes. 


The documentary products of the AEC technical informa- 
tion program have assisted materially in establishing the basic 
framework of the atomic energy information programs of 
other governments; the AEC bibliographic tools and pro- 
cedures are the world’s standards in nuclear science. The 
Commission publishes its research results in more than 6,000 
technical reports and published articles annually, of which 
4,000 are issued without security classification and are made 
available to the world’s scientific and industrial communities; 
the additional 2,000 papers generated by AEC scientists and 
engineers are published in the professional and technical 
journals each year. 

To make available in the most useful form the rapidly 
increasing atomic energy literature produced in this country 
and abroad, an extensive system of preparing and supplying 
reference tools tailored to the needs of the users has been ' 
maintained by the Commission’s Technical Information 
Service since 1947. Abstract journals, both classified and 
unclassified, covering the AEC-originated literature and 
that coming from other sources around the world are pub- 
lished biweekly. Bibliographies in specialized subject 
areas are compiled and issued. State-of-the-art books, pro- 
ceedings of technical meetings, and review journals are also 
published. Technical films and translations of selected 
foreign reports are prepared and disseminated. All useful 
AEC technical reports and engineering drawings are placed 
on sale to the public in full-size or microcopy form; more 
than 300,000 copies are sold annually to the public. 

In addition to providing information products and services 
to its own laboratories and those of other Government 
agencies and their contractors, the AEC provides complete 
collections of AEC unclassified research and development 
materials to more than 160 depository libraries in this 
country and overseas fer the use of the world’s scientific 
community. 





RT 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


The Commission has always participated in the develop- 
ment of new techniques and systems of information analyses, 
storage, and retrieval. Beginning in 1949, the Commission 
sponsored the development of prototype equipment by 
RCA for flatbed facsimile transmission of research materials, 
tested first between AEC facilities in Oak Ridge and later 
in collaboration with the Library of Congress, National 
Institutes of Health, and the Army Medical Library. 
Support was also given to research and development work 
on the rapid selector carried on by the Department of 
Agriculture. 

As soon as machine improvements are proved feasible 
they are adopted for use in the AEC information program. 
In 1949 the AEC developed one of the first IBM installations 
for the control of classified documents and later assisted 
various AEC laboratories in the development of similar 
installations. 

For several years this IBM equipment was used for the 
preparation of author and report number indexes for Nuclear 
Science Abstracts as well as for special listings in other AEC 
bibliographic tools. Experiments were also conducted with 
this punch card equipment for the storage and retrieval of 
technical information. The latter experiments predated the 
current electronic computers and although the experimental 
results proved impractical for large scale application they 
did produce valuable experience and background for current 
utilization of high speed computers. 

In 1949 the AEC cooperated with the IBM Corp. in the 
development of ‘demountable’ type-bars for IBM com- 
posing machines. 

Tests were conducted at four AEC libraries on coordinated 
indexing and the AEC has cooperated with the National 
Science Foundation in the development of an experimental 
classification and coding system for AEC documents. The 
ultimate purpose is to develop a coding scheme which can 
be utilized by a small digital computer which would be 
economically feasible for smaller libraries. 

In 1958 the AEC cooperated with the University of 
Virginia in its research on the utilization of closed-circuit 
television between libraries. 

Late in 1958, through the practical application of existing 
equipment and techniques the AEC developed a simple, 
practical system for the speedy preparation of separate 
indexes for each biweekly issue of Nuclear Science Abstracts 
and for the quarterly, semiannual, and annual cumulations 
of these indexes. The system was proved in actual opera- 
tion during 1959 for the 24 regular issues of Nuclear Science 
Abstracts and the quarterly, semiannual, third quarterly, 
and annual index cumulations. The annual index (1,650 
pages) for NSA volume 13 covering all abstracts announced 
through December 31, 1959, was distributed in Februar 
1960. This same technique is now used by the AEC for all 
of its journal indexes and other special cumulated listings. 


37 








38 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


The National Library of Medicine beginning in 1960 is 
using a similar system for its production of Index Medicus 
and the American Institute of Physics utilizing the detailed 
AEC procedures will produce in 1960 the annual index to the 
Journal of Chemical Physics. 

The AEC is continuing its studies on the application of 
computer machine processing of scientific and technical data. 
A major cost factor involved in the conversion to a machine 
system is the cost of input of data into the machine. Cur- 
rently all data must be key punched which is a costly and 
time consuming operation. Obviously, direct conversion of 
data by machine scanning would be infinitely superior. An 
AEC laboratory has devised a system for pattern discrimina- 
tion, characterization, and mensuration using an IBM 704 
computer. ‘To evaluate the discriminating capability of the 
system, typewritten numerals, hand-block print, and hand- 
written script characters have been used as patterns of 
respectively increasing complexity and individual varia- 
bility. Recognition is accomplished on a character by 
by character basis and also by contextual relationships. 
The initial experimental results are encouraging and develop- 
ment work is proceeding. 

During 1960 the Commission expects to convert, for testing 
purposes, to a machine retrieval system, a specialized segment 
of its atomic energy literature. The machine data system 
which eventually will be adopted by the AEC for its entire 
program will be selected on the basis of its effectiveness, 
economy, and interchangeability with information centers 
for other scientific disciplines. 

This paper thus far has summarized the main activities of 
the AEC’s technical information program. ‘To provide a 
clearer picture of the AEC technical information products 
and bibliographic tools a more detailed description of each is 
provided in the following paragraphs. 


Nuclear Science Abstracts (NSA) 


A semimonthly abstract journal published by the U.S. 
Atomic Energy Commission serves the world’s scientific and 
engineering community by providing abstracts of the litera- 
ture on nuclear science and technology as well as immediate 
and detailed indexes to that literature. The coverage of Nu- 
clear Science Abstracts is worldwide and includes: (1) research 
reports of government agencies, universities, and industrial 
research organizations; (2) articles appearing in technical 
and scientific journals; (3) books, sence tions, and patents. 
It has been published since 1947 and is the ‘only abstract 
journal devoted solely to nuclear science. Its expanding 
scope and growth have paralleled the growth of nuclear enter- 
prise throughout the world. In 1955 NSA carried 8,004 ab- 
stracts and it is expected that from 25,000 to 26,000 abstracts 
will be included in the current volume of NSA (vol. 14). 

Abstracts in each issue of Nuclear Science Abstracts are 
grouped under broad subject categories for ease of scanning. 
In addition NSA is equipped with four indexes as a searching 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


tool to insure the most rapid access to the detailed contents 
of the original articles. 

NSA Indexing Pattern.—Each semimonthly issue of the 
current NSA volume carries four indexes: (1) corporate 
author, (2) personal author, (3) report number (with avail- 
ability information), and (4) subject. These indexes are 
cumulated and published as special index issues four times 
each year—the first quarter cumulation covers issues 1-6; 
the semiannual covers issues 1-12; the third quarter covers 
issues 13-18; and the annual covers issues 1-24. A signifi- 
cant “breakthrough” in production techniques enables the 
AEC’s Technical Information Service to publish each issue 
of Nuclear Science Abstracts with its expanded indexes 
without having to extend the processing time for producing 
the journal. In addition, the first and third quarter cumu- 
lated index issue is distributed about 3 weeks following the 
close of the quarter; the semiannual index is distributed late 
in July; and the annual index for the 1959 NSA was dis- 
tributed early in February 1960. The new production 
techniques have actually lowered the unit production costs 
of issuing indexes and permit rapid publication of cumulated 
indexes on a schedule unequalled by any other major abstract 
journal. 

Prior to the use of the new techniques the Technical In- 
formation Service utilized standard ‘“‘cold composition’ in 
preparing for offset printing the indexes for Nuclear Science 
Abstracts. The production costs and the production rate 
limitations of the standard techniques limited the NSA in- 
dexing pattern to two indexes per issue (personal author 
and report number) and to two index cumulations per year 
semi-annual and annual) which did include subject indexes. 
The author index was a simple IBM accounting machine 
tabulation of the authors’ names and the abstract numbers. 
For ease of cumulation the index entries for the subject 
index and the report number index were typed on individ- 
ual white cards. These cards were manually arranged in 
columns and superimposed one on the other so that only 
the typed image was visible. The cards were fastened to 
page makeup sheets for photography. The high cost of this 
manual “shingling”’ technique was matched by the equally 
high cost of manually interfiling, “shingling,”’ and ‘un- 
shingling’’ for successive cumulations became excessive. It 
was evident that a new production approach utilizing ma- 
chine operations had to be devised. The new AKC tech- 
niques have eliminated the time consuming and costly man- 
ual operations in favor of speedy and economical machine 
operations. The new system utilizes equipment readily 
available and is successful through the effective adaptation 
and combination of products of several manufacturers. In 
its simplest terms the index production system provides 
for— 

(1) Typing of index entry in a prescribed area of an 
IBM accounting machine puncheard; 


39 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


(2) Machine coding the puncheard so that the thou- 
sands of index entries can be machine sorted and ar- 
ranged; 

(3) Machine controlled photographing of the cards 
at a rate of 230 cards (lines) per minute; 

(4) Arrangement and makeup of the resultant nega- 
tives into plate negatives for photo offset printing. 

The immediate urgent need precluded lengthy research, 
development, and testing phases by the Technical Infor- 
mation Service prior to installation of the new NSA index 
listing system. Production experience to date has been 
satisfactory with minor improvements in the system being 
made on a day-to-day basis. The NSA index listing system 
is successful. The AEC hopes that the NSA accomplishment 
will stimulate further developments in the area of machine 
produced cumulated listings. At this stage the NSA system 
has been designed to provide maximum flexibility in adapting 
to future developments. 

All equipment is standard and is available commercially. 
The following equipment is used: 

1. Composing machines type the reproduction image of 
the index entries on while punchcards die-cut with a circu- 
lar hole on the left edge and a square hole on the right edge 
to insure proper horizontal and vertical registration. The 
Technical Information Service uses one of two types of ma- 
chines depending upon the index entry and the type face 
desired: (a) International Business Machine proportional 
spacing typewriter having clamp-type platens and (6) Vari- 

yper line composing machine—model LC—100 which com- 
poses only one line of copy per card. The bold face No. 2 
IBM type face is reserved for main headings in the indexes 
which do not have cross reference modifiers. Main headings 
for ‘‘see references” are composed on the Vari-Typer. Only 
one line of copy is composed in the image area on each die-cut 
card, to provide for maximum flexibility in utilizing improved 
equipment which will be developed in the future. 

2. Standard International Business Machine (IBM) ac- 
counting equipment is used to keypunch, sort, and arrange 
the composed cards carrying the reproduction image of the 
index entries. 

(a) Keypunch, model 024, keypunches the card for 
automatic machine sorting and arrangement; 

(6) an alphabetical interpreter, type 522, model 1, 
interprets for checking the numeric codes of the main 
subject headings and the main corporate index headings 
on the master set of IBM ecards for these two indexes; 

(c) a sorter, model 083, sorts, arranges in sequence, or 
decolates 1,000 cards per minute; 

(¢) a collator, model 089, collates and merges the 
cards. 

3. A mechanized camera (Recordak Listomatic Camera, 
model No. 1) photographs ecards at the rate of 230 cards per 
minute. Both film and positive photographic paper are 
available for use with the camera. Film kits are available 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


which permit desired photographic reduction and govern the 
number of vertical lines per inch which the camera will 
photograph. The width of the horizontal area photographed 
is limited by the area required for the numerical and alpha- 
betical coding. Photographie reduction of copy can be ob- 
tained but enlargement is not available. 

4. An automatic photographic developing and drying 
machine (Oscar Fisher Spray Processal, model G-—12) 
processes the roll paper or roll film product. 

Advance planning for subject and corporate indexes.—The 
Atomie Energy Commission has published two _ basic 
authority listings, TID-5001 (second revision), Subject 
Headings Used in the Catalogs of the U.S. Atomic Energy 
Commission (Unclassified); and TID-5059 (third revision), 
Corporate Author Entries Used by the Technical Informa- 
tion Service in Cataloging Reports. Both listings are avail- 
able from the Office of Technical Services, Department of 
Commerce, Washington 25, D.C. These compilations list 
the main headings used in the subject and corporate author 
indexes of the NSA. 

All main headings in both publications have been assigned 
code numbers to take maximum advantage of IBM equip- 
ment card sorting capabilities. Subject heading entries in 
TID-—5001 were assigned a six-digit code (1-999,999) and cor- 
porate entries in TI1D-5059 were assigned a five-digit code 
(1-99,999). Code numbers were spread throughout the six- 
or five-digit field in each index to provide room for normal 
growth. Additional room beyond this normal expansion 
has been provided by two digits in a decimal field. The 
numeric codes permit the alphabetic arrangement of all 
main headings mechanically without double sorting for alpha- 
betical arrangement and the codes will enable the machine 
to select modifier (secondary) and “see reference’’ cards to 
be listed behind appropriate main headings. Within the next 
several months the Technical Information Service plans to 
compile a code and authority listing for report numbers. 

Composition and coding of subject and corporate main heading 
cards.—The subject and corporate main headings and cross 
references in TID-5001 and TID—5059 were composed on 
punchcards—one line of copy to each card (fig. 2). Care was 
taken during composition to insure that the vertical and 
horizontal alinement was exact and that the indentations 
from the left margin were consistent in order to obtain even 
margins on the reproduced pages. The cards were proof- 
read and checked to guarantee their accuracy. 

Code numbers were then automatically punched into each 
card. The 15,000 main subject headings and the 4,500 
corporate headings were punched with the appropriate 
main heading code number, and for checking purposes these 
numbers were interpreted by the IBM alphabetical interpreter 
on the punchcards to the left of and on the same horizontal 
line as the image area. Blank IBM cards for spacing were 
inserted by the IBM collator after every 60 caris. The 
IBM punchcards were then photographed on the mechanized 


41 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


camera directly on to positive photographic paper through 
the paper in order to maintain normal left to right reading 
of the copy. The photographic listings were cut into pages 
at the blank areas and bound into code books, one for the 
subject headings and one for the corporate headings. Mul- 
tiple copies were made and were assigned to the professional 
indexers for their use in selecting the proper code for the 
index entries at the time each article is indexed. One code 
book was coded with appropriate “see reference’ codes, and 
these codes were subsequently punched into the main head- 
ing cards. After preparation of the master code books the 
coded IBM punchcards were stored for later use as reproduc- 
tion image copy for the subject and corporate index sections 
of Nuclear Science Abstracts. 

The main heading cards are to be used repeatedly for the 
individual and cumulative index issues of NSA, as well as for 
future revisions of the code and heading authority books. 
Selection of type faces was made only after thoughtful con- 
sideration. Specific type faces employed are not cited here 
since they are largely a matter of individual preference and 
are not germane to the techniques being discussed except 
that, once selected, they must be accepted for long-term use 
to avoid complicating the preparation of the cumulations. 

Daily workloads for all regular NSA issues.—A complete 
index entry in NSA is made up of two types of cards which 
reflect the information in the table below: 





Index Main heading cards Secondary cards 
fa Ee Ae ee 
| 
Subject index... ....-....-- | Primary subject__........... Subject modifier, abstract 
number, and report num- 
ber where provided. 
Author index._.........--.. Personal author--....-......- Short title, abstract num- 
ber, and report number 
where provided. 
Corporate author index__..| Name of organization._.....; Short title, abstract num- 
j ber, and report number. 
Report number index_.....| Alphabetical portion of re- | Numerical portion of report 


| port number. number, abstract number, 
| and availability citation. 


Main heading cards for the subject and corporate author 
have been coded and prepared in advance as described 
under ‘‘Composition and Coding of Subject and Corporate 
Main Heading Cards.’”’ Main heading cards and secondary 
cards must be prepared for the personal author and report 
number indexes. In addition, secondary cards are required 
for both the subject and corporate indexes. 

Preparation of journal reproduction copy.—On a daily 
basis, the reproduction image copy is prepared for each 
abstract on iBM proportional spacing typewriters. Manu- 
seript for the index entries of each of the four NSA indexes 
is temporarily withheld from composition because the 
abstract number has not yet been assigned. The reproduc- 
tion copy for the abstract and the manuscript for the index 
entries are held for assignment of the abstract number. 

On the scheduled semimonthly cutoff date the abstracts 
for a given issue are grouped into major subject categories and 


a RN A NT 





eae 


oe TN TEN AE EMO A 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


are numbered sequentially. Page makeup of the abstract 
portion of the abstract journal is initiated. 

The abstract numbers are added to the index manuscript 
and this manuscript is composed by typewriters on punch- 
cards—one line per card. Since main heading cards have 
already been prepared for the subject and corporate author 
cards, only secondary cards for these indexes are prepared. 
Main and secondary personal author cards, as well as main 
and secondary cards for the report number index are also 
prepared. After proofreading, the cards for each index are 
punched as follows for automatic sorting from information 
appearing on the card: 

1. Into the subject index secondary puncheards are 
punched the numerical code of the appropriate main- 
heading card and the first 23 letters of the subject 
modifier which has already been composed in the image 
area of the card; 

2. Into the corporate author index secondary punch- 
cards are keypunched the numerical code for the organ- 
ization and the first 23 letters of the report title which 
appears in the image area; 

3. Into the personal author index IBM cards—both 
main and secondary—is punched the author’s name. 
In addition the first two letters of the title appearing in 
the image area are also keypunched into the secondary 
cards; 

4. Into the report number index main IBM card is 
keypunched the alphabetical portion of the report num- 
ber. Both the alphabetical portion and numerical 
portion of the report number are keypunched into the 
secondary cards. 

Mechanical arrangement of indexes.—To arrange the subject 
index, secondary cards are sorted alphabetically by the first 
23 digits of the image area which have previously been key- 
punched into the cards by the main heading code number. 
The collator then matches this file against the main heading 
file, selects the main heading card, and places it in front of 
its. appropriate modifier cards. Cards for the corporate 
author index are arranged in a similar manner. Personal 
author cards are sorted by the IBM sorter alphabetically. 
The short title on the secondary personal author cards are 
arranged alphabetically behind each author by the first two 
letters, of the short title. The secondary cards for the 
report number index are sorted by report number and 
collated behind the appropriate boldface alphabetical 
designation for the report. 

Mechanized camera and page makeup.—The final made-up 
index pages are in two column format. The number of lines 
for each column is established and IBM cards are added as 
appropriate for spacing, running heads, and page numbers. 
The cards are photographed by the mechanized camera at 
the rate of 230 per minute. Paper camera proofs for a final 
check are prepared for the editors. ‘When the proof is ap- 
proved the cards are photographed on roll film. The film is 





43 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


developed in the automatic processal machine, cut into gal- 
leys, and is made up into book pages which serve as plate neg- 
atives for offset printing. 

Preparation of cumulated indexes.—Composed IBM cards 
for each index become available for cumulations after they 
have been photographed on film for the regular NSA issue. 
The main heading cards for all four indexes are decollated 
for use in subsequent individual issue indexes and cumula- 
tions; the secondary cards are interfiled with the cards of 
previous issues of the volume and held for future cumulations. 

When a cumulation is to be prepared, such as the first 
quarterly cumulation of the subject index in volume 13, the 
secondary subject cards for the first six regular issues of the 
volume have already been interfiled and the following steps 
are taken: 

1. The IBM collator matches the secondary cards 
against the file of main-heading cards and selects the 
appropriate main heading cards; 

2. The IBM collator matches the selected main head- 
ing cards against the “‘see references’ and selects the 
appropriate ‘‘see references” ; 

3. The IBM collator merges the main subject heading 
cards and the “see reference” cards so that the “‘see 
reference’’ card is located behind the appropriate main 
heading; 

4. The IBM collator merges the merged file of main 
heading cards and “‘see references,” with the secondary 
subject cards so that all cards are in proper sequence. 
“See references” are scheduled to appear only in quar- 
terly, semiannual, annual, and multiannual cumulations. 

IBM cards for spacing, page numbers, and running heads 
are inserted as appropriate. The quarterly cumulated sub- 
ject index is then ready to be photographed on paper, re- 
viewed by the editors, rephotographed on film and made up 
into book pages for the printer. The cumulated corporate 
author index is prepared in a similar manner. 

The personal author and report number indexes appearing 
in each cumulation of NSA are prepared in essentially the 
same manner, the cards from the regular issues being sorted 
and arranged by IBM machine to form the required cumula- 
tions. Duplicate author names are removed by the collator. 

Quarterly cumulations are decollated in the same manner 
as regular issues and the cards are used in subsequently 
scheduled cumulations, i.e. semiannual, annual, and multi- 
annual cumulations. 

Elimination of card catalogs.—The complete indexes in the 
individual issues of NSA combined with the frequent and 
prompt index cumulations have eliminated the need for 
continuing unclassified AEC card catalogs. 

There will no longér be a requirement for printing, dis- 
tributing, and filing; each year millions of AEC catalog cards. 
The attendant savings both to the AEC and to all libraries 
maintaining card catalogs is obvious. Each user of Nuclear 
Science Abstracts whether at the laboratory bench or in the 





SET SCT ST RN At RR SN ecm 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


library will have in effect a complete catalog to nuclear 
science information. 

Neither magic techniques nor custom-built machines have 
been responsible for the NSA achievement. Success was due 
instead, to the practical application and combination of 
existing equipment and techniques. 

The AEC classified abstract journals utilize the same 
techniques described above. Each abstract journal is 
not only an up-to-date announcement medium, but provides 
each user with a complete and current key to the world’s 
atomic energy literature. AEC scientists have enthusias- 
tically endorsed the new indexing patterns as giving them at 
See benches an effective retrospective searching 
tool. 


Research and development reports 


Reports have formed the major proportion of the literature 
of atomic energy. It is true that the ratio of journal articles 
is increasing; nevertheless, reports remain one of our most 
important sources of scientific and technical information 
for atomic energy. In a sense, they represent a new form of 
literature that requires special forms of bibliographic control. 
In the early days of the U.S. nuclear program, the AEC had 
no intention of setting up a special comprehensive biblio- 
graphic apparatus to control its literature. However, when 
attempts were made to channel the report literature into the 
established abstracting services, they refused to accept 
reports because of their fear that the reports would not be 
generally available. As a result, it became necessary for 
the AEC to organize a comprehensive bibliographic proces- 
sing system for its reports. 

Reports are a more primitive form of scientific and 
technical literature than professional journal articles because 
they are produced earlier in the research program. Since 
the scientific community now consumes information virtually 
as quickly as it is produced, this information must be 
recorded as quickly as possible. 

The bulk of the AEC technical information has been made 
available in the 28,000 research and development reports 
published by the AEC itself. Additional AEC reports, 
estimated at 4,000, will be produced and distributed during 
1960. Generally, the reports fall into two categories, 
progress type and summary. A progress report may cover 
a specific project or a large number of activities of an entire 
laboratory for a given period of time, and is usually issued at 
the regular monthly, quarterly, semiannual, or annual inter- 
vals. In the normal course of scientific progress the results 
described may be superseded. In general practice proven 
data are extracted from progress reports and Published. in the 
summary reports or in the technical journal literature. The 
progress report is important because it is the only available 
written source of the material until the summary report or 
technical journal article is published and delays in writing 
and publishing summary reports or journal articles may run 


54122—60——4 


45 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


as high as a year or two. Moreover, progress reports are 
usually the only written source of negative information, that 
is, the record of the difficulties and failures experienced with 
certain experiments or techniques; and of the incidental data 
such as the preparations necessary for experiments and the 
rationale for adopting certain approaches. This information 
is often missing from the summary report or journal article. 

Although important, the progress report is primarily a tem- 
porary reference until the summary report appears. The 
summary report, of course, is the most useful report and the 
type most frequently sought. The summary report usually 
is well written and is, therefore, very valuable to the scientific 
worker with limited time for literature study. The scientist 
desiring more detailed information will often supplement the 
topical summary report with progress reports. 

The AEC maintains a standard distribution system to 
facilitate the broadest and most expeditious means of dis- 
tributing technical reports to authorized official recipients in 
the national nuclear energy effort. This system provides for 
the direct distribution of reports by issuing organizations 
to installations that have an interest in, or a requirement for, 
the information in the reports. The standard distribution 
system applies to both classified and unclassified report litera- 
ture. In the case of classified information, however, distri- 
bution must be made consistent with the requirements of 
national security. Distribution of sufficient copies of reports 
is made to TIS to fill requests and to distribute to other or- 
ganizations which have a need for the information but which 
are not listed in the standard classified and unclassified 
distribution lists. 

AEC contractors generally reproduce and distribute their 
own technical reports. When local reproduction facilities 
are limited or nonexistent, TIS undertakes the reproduction 
and distribution of both classified and nonclassified technical 
reports that are sponsored by AEC funds. 

Proceedings of scientific symposia and meetings —During 
the past decade, meetings, conferences, and symposia have 
become important mechanisms for the exchange of informa- 
tion on virtually every aspect of nuclear science and tech- 
nology. Papers prepared for these gatherings are a valuable 
technical information resource. The U.S. Atomic Energy 
Commission seeks to employ this resource for the benefit of 
the world’s scientific community by publishing the proceed- 
ings of technical meetings. A listing and description of 
proceedings of meetings in which the Commission or its 
contractors were sponsors, cosponsors, or major partici- 
pants was filed with the committee. 

Technical books.—Literally tens of thousands of reports are 
being written each year throughout the world in the various 
scientific disciplines involved in atomic energy work. Be- 
cause the number of reports is so great and the rate of 
obsolescence is high, there is an essential need for com- 
prehensive books and review journals which condense the 
volume of printed words into high quality “state-of-art” 


a LE NS 





A 


eGR RRS 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


resentations. To help meet the need for such books the 
USAEC undertook in 1955 an intensive program to en- 
courage publication and, where necessary, to sponsor 
preparation of book manuscripts. Through government 
financial support and special contractual arrangements, 
USAEC sponsored books have been made widely available 
at moderate prices. 

During 1958 and 1959 the AEC published 20 books and 
initiated the writing of 16 others. Twelve new AEC- 
sponsored volumes were presented to key delegates of the 
Second International Conference on the Peaceful Uses of 
Atomic Energy at Geneva in September 1958. A complete 
description of all USAEC books has been published in the 
USAEC book catalog entitled ““Technical Books.” 

Technical progress reviews.—Because published books in 
certain rapidly developing fields cannot be kept sufficiently 
current to meet the needs of science and industry, the 
USAEC prepares and issues Quarterly Technical Progress 
Reviews covering specific subject areas. These journals 
describe, analyze, and evaluate the latest developments 
in these fields. Not only is the published literature covered 
in these reviews but the editors, because of their intimate 
association with the existing research programs, often in- 
clude information on the latest developments in advance of 
publication of the information in formal full-length articles 
or reports. The Reviews are an efficient current reference in 
rapidly developing research areas, provide convenient sum- 
maries and are fully referenced to the basic research reports. 
Three quarterly Reviews, now in their third year of publica- 
tion, are entitled “Power Reactor Technology,” “Reactor 
Core Materials,” and “Reactor Fuel Processing.” A fourth, 
entitled, “Nuclear Safety,”’ was initiated in September 1959. 

Bibliographies.—Another important mechanism to facili- 
tate the identification, selection, and location of literature 
within specific subject areas is the bibliography. The prep- 
aration of bibliographies for retrospective scientific search- 
ing has always been an important part of AEC technical 
information activity. These compilations range from short 
title lists to comprehensive, annotated bibliographies that 
are in themselves indexed and subject categorized. 


Translations 


The USAEC conducts an active translation program. 
The emphasis at present is on those languages which are not 
widely known in the U.S. scientific community, namely, the 
Slavic and the oriental languages. All translations pertinent 
to atomic energy prepared by the USAEC and cooperating 


organizations are, of course, included in Nuclear Science 
Abstracts. 


Engineering drawings 
In the atomic energy field a considerable quantity of 
specialized equipment and laboratory facilities have been 


developed. Much of the design and equipment has become 
standard and can be utilized in any laboratory using radio- 


47 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


isotopes. The report literature for the most part pays 
little attention to this type of information. The USAEC 
has recently expanded its program for the dissemination of 
information to cover engineering materials. The engineering 
materials include drawings, specifications, design criteria, 
and photographs covering all phases of nuclear engineering 
from small laboratory gadgets to complete laboratories and 
reactors. 

Technical films.—The technical film is becoming of increas- 
ing importance in technical information dissemination and 
as scientific training tools. The USAEC has produced 71 
professional level technical motion picture films. Forty-five 
of these were shown in four languages—Spanish, English, 
French, and Russian—at the Second International Con- 
ference on the Peaceful Uses of Atomic Energy at Geneva in 
September 1958. The films cover a wide variety of topics 
from power reactors to uses of isotopes and radiation in in- 
dustry, agriculture, medicine, and research. They are in 
color and black and white and range in length from 5 to 55 
minutes. 

Prints of these films are available on loan from AEC offices 
in this country. Overseas, with foreign language sound 
tracks, they are available on loan from USAEC Scientific 
Representatives abroad or from U.S. Information Service 
offices. 

Depository libraries 

The USAEC has provided 84 domestic depository libraries 
and 83 depository library collections in 58 countries outside 
of the United States. They have been provided as part of 
President Eisenhower’s Atoms-for-Peace program. 

A library today consists of over 28,000 reports, some in 
microcard form and some in full-size copy; a complete set of 
Nuclear Science Abstracts; over 90 books; a complete set of 
AEC bibliographies; Technical Progress Reviews; and a 
microcard reader. In addition, all new materials, as pub- 
lished, are sent automatically to the library. Currently, 
AEC reports are being added automatically at a rate of 4,000 
each year. 


Information exchanges 


Nuclear science information flows from the United States 
to other countries either (1) under arrangements covering 
cooperation between this country and other countries or 
international organization; or (2) under arrangements more 
limited in scope and covering individual document exchanges 
between the AEC and overseas research organizations, 1n- 
dustrial organizations, learned societies, universities, acade- 
mies and institutes. 

The largest foreign contributors of reports and other in- 
formational materials to the United States are naturally 
those countries with older active atomic energy programs. 
During calendar 1959, for example, Australia, Belgium, 
Canada, Great Britain, Switzerland, and the Netherlands 
contributed over 74 percent of the 1,976 topical reports, 





wanes 


AE 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


translations, and reprints received from foreign sources and 
processed by the AEC for the U.S. scientific community. 

The AEC is constantly refining and expanding its infor- 
mation exchange program to effect acquisition at the earliest 
possible date of the documented results of any non-US. 
atomic energy research. United States participation in the 
International Conference on the Utilization of Atomic Energy 
Scientific and Technical Information, May 26-29, 1958, in 
Geneva, Switzerland, and the subsequent visits of informa- 
tion officers from the International Atomic Energy Agency, 
the United Kingdom, Euratom and other European organi- 
zations to this country have done much to promote the 
prompt receipt of European informational materials by the 
AEC for use and inclusion in the various U.S. programs. 

As required by the Atomic Energy Act of 1946 and its 
revision of 1954, the USAEC science information exchange 
program is giving fullest possible support to the Commission’s 
mission of “supporting to the utmost the interchange of 
scientific information throughout the world’s scientific 
community.”’ The AEC’s technical information program is 
the world’s standard in the nuclear science field and is the 
framework upon which has been or is being built the atomic 
energy technical information programs of all countries of the 
free world. The very large measures of good will and pres- 
tige resulting from this activity is especially valuable at 
this time of international scientific competition. 

Of equal importance, through this program, the United 
States receives thousands of valuable scientific documents 
of immediate use to the Nation’s scientific programs, 
Specific values accruing to this country derive from the fact 
that the program aids in eliminating from the laboratory 
research of TS. scientists unnecessary and duplicatory 
efforts and thus accelerates existing U.S. scientific programs 
and releases valuable scientific manpower to explore new 
challenging areas. This saving of critical calendar time for 
the United States in the competition for world scientific 
leadership is of inestimable value. 

Through the AEC information exchange program the 
overall fund of scientific knowledge available to all scientists 
has been greatly increased. The AEC has set the pace 
among free peoples in the dissemination of scientific knowl- 
edge and the AEC distribution procedures and patterns are 
being adopted by all major nations in the free world for use 
in disseminating their own technical data. This worldwide 
dissemination of the scientific knowledge of the peaceful 
uses of the atom fosters the cross fertilization of ideas 
necessary for scientific advancement. 

Skills and techniques in scientific documentation in other 
countries are not as advanced as those developed by the 
AEC for the bibliographic organization and control of the 
tremendous volume of recorded nuclear science data. 
However, the AEC is programing specialized technical infor- 
mation workshops to help nations and international organi- 
zations with U.S. Atoms-for-Peace collections to derive the 


49 





DOCUMENTATION OF SCIENTIFIC INFORMATION 





maximum value from this resource. The first such workshop 
was held in Geneva in May 1958 with the main objective of 
maximizing the utility of the Atoms-for-Peace collections 
but also to insure automatic provision to the United States 
of valuable documentary scientific materials previously either 
unavailable or available only after lengthy delays and com- 
plex procurement actions costly in terms of time, man- 
power, and money. 


In addition, the Assistant Director for Technical Information 
Service, Division of Information Services, AEC, provided the staff 
with a draft prepared as of February 18, 1960, on the program being 
studied and formulated by the Committee on Information Systems, 
setting forth a comprehensive and thoughtful approach to problems 


involved in the expansion of existing information processing systems 
as follows: 


The Committee on Information Systems believes that there 
are a number of questions that must be settled by the panel 
or its committees before we can proceed further with the 
task of finding the optimum machine system for handling 
our technical information problems. We believe that there 
are machines existing that oes the capacity for handling our 
task under certain conditions. However, the kind of appli- 
cation that we make and the requirements to be set on the 
system as to the scope of coverage, speed of operation, depth 
of retrieval, ease of maintenance, etc., will make a large dif- 
ference in the choice of machines and the cost of the ap- 
plication. 

1. We must first decide upon the scope of information to 
be handled by a machine system. That is, should it be able 
to handle both the published journal literature and the AEC 
report literature? The committee believes that the system 
should do both. The report literature is not growing nearly 
as rapidly as it used to because more of the information is 
unclassified and is published in the journals. We believe 
that if we confine our attention to report material, our exist- 
ing manual system would undoubtedly be adequate for a 
long time, and the expense of changeover to a machine sys- 
tem would be hard to justify. However, if the journal 
literature is included, it seems clear that some faster and 
better retrieval system is necessary. 

2. In addition to the scope of coverage, a very important 
decision is necessary on the depth of indexing. If we are 
to index no more deeply than we do currently, then perhaps 
a machine system would be too costly. For deep indexing, 
with many points of retrieval per document, a machine sys- 
tem seems to be the only answer. With the great machine 
speeds that are available machines can search much faster 
and cover many more headings per document than is now pos- 
sible. The greater depth of indexing would give us greatly 
improved control over the information actually included in 
the documents, which of course is the ideal toward which we 
are striving. Another and related question is whether all 
fields of knowledge should be indexed to the same depth. It 


———— A TC LL TR TT SL I 
A A mn a ee 





eee 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


would seem reasonable that one might wish to apply the same 
general rules to all fields, but to documents of certain types 
one would extend a deeper or more detailed indexing than to 
others. For example, progress reports covering a number of 
programs may require very detailed indexing, while general 
discussions on a single topic might be well enough covered by 
fewer retrieval points. This, of course, is related to the item 
of cost, which is mentioned later. This matter should prob- 
ably be referred to the Library Committee. 

3. Work should be started immediately on a dictionary or 
dictionaries of key words (descriptors) for use in a machine 
system. ‘This task is in the province of the Library Com- 
mittee. That committee should consider the problem of 
whether a single dictionary can be devised to cover all of the 
technical literature, and if not, to suggest alternatives. It 
might be discovered that the technical literature could be 
divided into a number of broad fields with a descriptor dic- 
tionary for each. For example, it seems likely to us that the 
dictionary that would cover the medical and biological field 
would probably not have very much overlapping with the 
dictionary for the engineering field. By handling the fields 
separately, each with its own characteristic dictionary, it 
might be possible to simplify both the storage and retrieval 
problems. The real intellectual labor in the system goes 
into the indexing and processing of the information into a 
form that the machine can handle. The way in which this 
is done and the care with which it is done will have a large 
impact upon the value and efficiency of the system. 

4. At the time the Library Committee is working on the 
matter of the descriptor dictionary it should also give atten- 
tion to the use of role indicators and association links to 
reduce noise or cross talk in the system. There has been a 
certain amount of pioneering work done in this field which 
indicates that the amount of false information produced by 
a machine system can be substantially reduced by such 
devices as these. Studies should be made of existing and 
workable systems such as those used by Linde Air Products, 
Western Reserve University, etc., in which role mdicator 
devices are used. Also, it may be necessary to develop a 
uniform document citation system that can be used for both 
reports and journal material. The great complexity and 
variety of the report numbering systems currently in use 
may force this on us. 

5. The question of the retrieval of documents versus the 
retrieval of data must be settled. Should we try to do both, 
or should we try to do one in certain areas or fields and the 
other in different fields? For the maximum benefit to the 
applied researcher, a system that will produce numerical data 
(e.g., cross sections, melding points, and other physical, 
chemical, or nuclear properties) would be extremely valuable. 
On the other hand, there is much of the literature that deals 
with ideas and results that cannot easily be described in 
numerical terms. There is evidence that a system for the 
retrieval of documents by the use of descriptors, role indi- 


51 








52 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


cators, etc., may not be applied directly to the problem of 
correlating raw numerical data. One machine could prob- 
ably handle both problems but the ordering of the imput 
would differ greatly. 

6. The decision as to whether we should have a com- 
pletely centralized or a partially decentralized information 
service for AEC contractors must be made. A system that 
appeals to our committee is to have TIS establish and 
support a machine system, but also make the cards or tapes 
that are developed for use in that system available to con- 
tractors who have equipment that can be used for their 
manipulation. Of course, we believe that the AEC should 
cooperate with the National Science Foundation and the 
professional societies in developing national information 
centers covering various subject fields. However, we do 
not believe that the AEC should wait until this development 
has taken place and the national centers are established 
before proceeding on its own. Tied in with this question is, 
of course, the broader question, What do we want our 
information system to do? 

7. After some of the above questions have been answered, 
the decision must be made as to what machines will best 
handle our problem. As was mentioned above, there are 
several machines on the market that can probably handle 
either our whole problem or various phases of it, and which 
cost widely varying amounts of money, depending upon 
what we want them to do. This leads to an evaluation of 
the cost factors. How much money can the AEC afford to 
spend on its information system? What are the factors 
which are most important in determining the cost of a 
mechanical system? One of these, of course, is whether the 
information is stored internally in the machine or in the form 
of cards or tapes external to the machine. Other examples 
are the cost of preparation of the material for the machine, 
the cost of searching, the cost of posting and filing new data, 
etc. These are all important in measuring the total over- 
all cost. The Committee on Machine Systems can prepare 
some recommendations on the choice of machines based on 
the answers to the questions posed here. 

8. The relationship between a mechanized search system 
and nuclear science abstracts should be studied, probably by 
the Library Committee. If such a system is successfully 
established and put into operation, is NSA still necessary? 
We believe it is. In that event, can the work that goes into 
the preparation of NSA be utilized to prepare the informa- 
tion for the machine system without complete duplication? 
Also, some thought should be given to the use of the output 
of the machine system in the indexing of NSA by perhaps 
some change of form in the index. That is to say, it might 
be possible to combine the systems with some considerable 
reduction in expense without compromising the quality or 
effectiveness of either one. 

9. After it has been decided what the scope of this system 
should be, what depth of indexing is to be used, and after a 





ES TE 


tener seen eemetorer ier 


DOCUMENTATION OF SCIENTIFIC INFORMATION 53 


dictionary has been prepared and is ready for use, how shall 
the system be put into effect? The alternatives seem to be 
to have TISE do the job with the temporary acquisition of 
additional labor, or to have an outside organization set up 
the system and put it into operation and then turn it over to 
TISE after it has been debugged and is working effectively. 
Of course, the organization that supplies the machine equip- 
ment will furnish technical assistance in getting it into 
operation, but it would seem wise to employ also some or- 
ganization that is not engaged in selling the equipment of 
any particular manufacturer. 

(Excerpt from C. G. Stevenson’s letter of February 25, 
1960, to R. K. Wakerling:) 

“This section deals with a very complicated problem but 
I think your paragraph is quite adequate for our purposes. 
But many questions come to mind when you think of how 
NSA would fit into a mechanized program. Can descriptors 
suitable for machine retrieval be used in a subject index to 
NSA? Irather doubt it. If they cannot be so used, then the 
alternative is duplicate indexing or developing a new kind of 
subject index to NSA. Another basic problem is how to 
handle the sheer bulk of NSA, presently increasing at a 
geometric rate. Even with fewer subject entries per abstract, 
it appears that the 5-year cumulations would soon be too 
unwieldy tomanage. Yet I am certain that we must continue 
to publish NSA as an announcement bulletin and as a retro- 
spective searching tool for many individuals and small 
businesses who will not have machine systems available to 
them. 

“We will always need to maintain a searching tool for the 
‘bread and butter’ questions that make up most of the 
activity of a coskstiiea) information operation. I think that 
the disappointing results which Dr. Whaley and Bernie 
Dennis of ACT report for their machine systems is primarily 
due to the fact that the number of occasions when a detailed 
literature search is called for are quite rare. For the rest of 
the working day there are a multitude of relatively simple 
questions requiring simple tools to handle. I think our 
problem may boil down to the development of NSA for 
handling ‘bread and butter’ questions with machine retrieval, 
the exchange of tapes, etc., being the basic technique for 
detailed subject searching and analysis. A possible solution 
to the NSA problem might be the publication of a number of 
abstract journals, following the pattern of the four Technical 
Progress Reviews. These abstract journals could separately 
cover the broad fields which you mention in item 3.” 


Bio-ScieNcEs INFORMATION EXCHANGE 


The staff consulted at a with Dr. Leonard Carmichael, Secre- 
tary, and with Dr. J. Keddy and Dr. A. Remington Kellogg, 
Assistant Secretaries of Re Smithsonian Institution, which has admin- 
istrative jurisdiction over the operations of BSIE, relative to its 
operations and services, which is not supported by funds appropriated 








54 DOCUMENTATION OF SCIENTIFIC INFORMATION 


to Smithsonian, but is paid for in full by funds provided by the 
agencies served by the exchange. 

As set forth elsewhere in this report (pp. 13-15), the staff received a 
number of different versions regarding the coverage and efficiency of 
the operations carried on by BSIE. Some representatives of Federal 
agencies reported that the information provided in this field was 
adequate in every respect, and of great value to scientists, researchers, 
and agencies operating in this field, and to industry generally. Spokes- 
men for certain industries maintained that the coverage was too 
limited, that a number of agencies which generated information and 
data which should be included in the BSIE program did not partici- 
pate, and in some instances maintained duplicating records, whether 
or not they were participants. 

Dr. Carmichael and his associates were convinced that the BSIE 
program is performing a necessary service, that it was adequate in 
its coverage, and that efforts were now being made to bring about 
full participation by all Federal agencies developing material which 
should be included in the operation. These officials also assured the 
staff that a similar service in the physical sciences, which is being set 
up in the Smithsonian Institution under the same administrative con- 
trols would also provide adequate information and services in that field. 

After a review of the sections of this report dealing with the oper- 
ations of BSIE, Dr. Carmichael advised the staff that— 


* * * on the whole the comments with reference to the 


Bio-Sciences Information Exchange are fair statements of 
the operations of the Exchange. There is, however, a fun- 
damental misconception of its purpose and function. The 
BSIE was not designed as a “documentation center’’; the 
services it performs in this field are a fortunate by-product 
of its primary objective, i.e., the prevention of unknowing 
duplication of research awards among those agencies which 
support it. On page 127 the NSF reports: ‘The services 
which the BSIE has performed in the several years of opera- 
tion demonstrate the needs, feasibility, and value of the 
exchange of information on current research,’’ and the agen- 
cies supporting the BSIE take pride in the fact that a side 
issue has developed into the only comprehensive source of 
information on current research in medicine and biology in 
existence. 


In view of the many conflicting reports relative to the adequacy 
and effectiveness of the services provided by the BSIE, the following 
report was compiled by Dr. Edward Wenk of the Library of Congress, 
at the request of the staff of the Subcommittee on Reorganization 
and International Organizations in connection with its studies of 
international health programs. This report sets forth the status, 
program, participants, and other pertinent information relative to 
the extent of the services being rendered by the BSIE. 


The Bio-Sciences Information Exchange (BSIE) maintains 
an index of current research projects which represents the 
only coordinated single source of such information in the 
biological and medical sciences. It is fundamentally re- 
sponsible for gathering, organizing, and relating data on 
unclassified research supported by both Federal and non- 





> a ae: Pe” YS 


“Vv 
ig 
S, 
yn 
of 


1s 


—? 


to 


Le ree RRM IL 
see LC LL LEE LTE LE 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Federal agencies. This information is made available to all 
“abeageens. agencies and primarily utilized to prevent unknow- 
ing duplication of research support. The BSIE also serves 
research investigators. Sufficient project details are cata- 
loged on the content of research in progress so that scientists 
may obtain names of others who are working on related or 
parallel lines and who through direct contact may be a source 
of technical information prior to publication of reports. 

The Bio-Sciences Information Exchange is an independent 
establishment located in Washington, D.C., administratively 
attached to the Smithsonian Institution. 

Policies of BSIE are determined by a Governing Board 
composed of two representatives from each of the seven 
participating Federal agencies and the Smithsonian Institu- 
tion. These are: Atomic Energy Commission, the Depart- 
ments of the Army, Navy, and Air Force, Public Health 
Service, Veterans’ Administration, and the National Science 
Foundation. Funds for operation are granted by these 
same agencies out of their operating budgets. ‘The Govern- 
ing Board operates under an “agreement”? which serves as 
a charter outlining the responsibilities of the Exchange and 
the services it may offer granting agencies and individual 
scientists. Non-Federal cooperating agencies are not repre- 
sented on the Governing Board. 

The Department of Agriculture chose not to participate, 
and the BSIE does not include projects in the agricultural 
sciences because a similar index covering agricultural subject 
matter was already in effective operation by USDA when 
BSIE was inaugurated. The USDA system is described 
subsequently. Incidentally, some of the governing agencies 
also maintain autonomous project indices in biological and 
medical sciences, for internal administrative use, but no 
study was made of their various systems or procedures. 

Currently, the BSIE index records about 21,000 active 
research projects, which are sponsored by Federal and non- 
Federal agencies, both intramural and extramural, although 
primarily the latter. The number of projects terminated 
since 1946 and maintained in a separate file is much greater; 
12,410 new research grants and contracts were registered in 
1958. Of these, 9,993 were grants for unclassified research by 
Federal agencies amounting to $137,901,673; 2,417 were 
grants totaling $33,302,659 by fund-raising organizations 
and private foundations. 

Since BSIE was originally organized because of concern 
by Federal granting agencies over unknowing duplication 
of support in the biological and medical sciences its coverage 
of federally supported extramural research has been empha- 
sized. Intramural research of some of the participating 
Federal agencies is gradually and voluntarily bulge added. 
Most extramural research by the private agencies such as 
medical foundations is voluntarily reported to the Exchange, 
and an increasing number of individual researchers are list- 
ing their non-grant-supported intramural research. This 
last year, approximately 300 volunteered such information. 


50 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


Data on research proposals are also entered, and are 
available to granting agencies on request while the proposal 
isactive. If it is reject ted, the entry is destroyed; if accepted 
it assumes the status of an active project. 

The data at BSIE is confined to basic and applied research 
which had not reached the publication stage. Information 
pertaining to educational, construction, or facility grants or 
the rehabilitation and control programs of the governmental 
and fund-raising organizations are not in the main registered 
with the BSIE. Likewise, all security classified and proprie- 
tary information is excluded. Industrial research work 
such as by drug manufacturers is not registered with the 
Exchange at the present time, because of the problem of 
protecting proprietary rights. 

The BSIE obtains its financial support from the seven 
participating Federal agencies. The current annual cost of 
this operation is approximately $420,000. 

The history of BSIE may be traced back to the dissolution 
of the wartime Office of Scientific Research and Development 
(OSRD) in 1946, at which time a number of Federal agencies 
undertook their own support of research in the medical 
sciences. Information exchanges were established within 
various agencies, the largest of which was the Office of 
Exchange of Information of the Public Health Service. 
When the amount of research supported by Federal agencies 
in the medical field had grown from $4.3 million in 1946 to 
$33 million in 1949, with corresponding growth in number of 
research organizations, investigators, and related multiple- 
submitted proposals, it became imperative that research 
project information be coordinated in order to prevent un- 
knowing duplication of sponsorship. The Medical Sciences 
Information Exchange was then founded as a cooperative 
venture in July 1950 within the Division of Medical Sciences, 
National Research Council. Support and administrative 
policy for the Medical Sciences Information Exchange 
(MSIE) was considered the joint responsibility of the six 
participating Federal agencies. In the fall of 1953 MSIE 
was shifted to the Smithsonian Institution and renamed 
the Bio-Sciences Information Exchange to take into account 
its expansion into the fields of biology and psychology. It 
is still governed and supported by seven supporting Federal 
agencies. 

Input data is generated as follows: Participating and 
cooperating agencies periodically send BSIE lists of new 
research projects they support, as soon as they are approved. 
Some Federal agencies such as the National Institutes of 
Health and the major fund-raising agencies also register 
their research proposals which BSIE holds until the request 
is either approved or disapproved. All data on the proposal 
are removed from the files if the project is disapproved for 
any reason. If approved, the project is fully indexed and 
entered into the BSIE system. 

The one-page reporting form ‘Notice of Research Project” 
records the name of the supporting agency, the title of the 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


research project, the name, titles, and affiliation of all the 
professional personnel engaged on the project, the name and 
address of the institution at which the research is being done, 
a 300-word technical summary of the proposed work, the 
dates and other information on the financial support given 
the project (except for private intramural research), and the 
identity of the professional school with which the project 
should be identified. 

The summary description includes the purposes of the 
project, techniques and substances to be used, as well as 
possible relationships to other aspects of research in various 
fields. No attempt is made to validate the summary which 
is prepared by the responsible investigators; information on 
funds and other administrative detail are provided by the 
granting agencies. Accuracy of the technical content de- 
pends in large measure on the reports sent in by the investi- 
gators, but the subject matter specialists have an important 
role in assuring clarity and uniformity of the material by 
additional notes as they review entries for cross-indexing. 

Project information is subject-matter cross indexed in as 
many as 15 subcategories of research fields by the BSIE 
professional staff. These subject-matter specialists use 
scientific concepts to determine the interrelationships of the 
information contained in the summary of the investigator, 
to the complex series of indexes which have more than 6,000 
titles. The BSIE staff also attempts to anticipate future 
uses of the information when determining the ways in which 
the information will be cross indexed. These subject-matter 
indexes record methods, substances, biological specimens, 
as well as purposes of the project. 

Other indexes summarize the research programs of agen- 
cies, research institutions, the investigators, and the States 
involved in research projects. Dollar totals of research 
support are available by subject, agency, State, and in- 
vestigator and institution. 

The “Notice of Research Project” is reproduced and kept 
in open files for easy access, but certain key elements of 
project data are transferred by clerical staff to punchcards 
for storage sorting and retrieval by various criteria. 

A file of the terminated projects is maintained on microfilm 
utilizing an index identical to that for active projects. This 
system enables the staff to scan and if desired quickly re- 
produce copies of the original ‘‘Notice of Research Project.” 
These records also make it possible to trace the research 
support given individual investigators, institutions, and 
localities, and to furnish those making inquiries with a com- 
plete picture of recently completed but perhaps unpublished 
work as well as active projects relevant to a special field of 
interest. 

The services of BSIE are available without charge to all 
participating and cooperating agencies and to individual 
investigators associated with recognized research institu- 


tions. Information may be obtained by correspondence or 
by phone. 


57 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


The services of BSIE are utilized extensively by granting 
agencies to determine the amounts and sources of support 
given individuals, institutions, localities, and types of research 
as a guide to a balanced program and equitable sponsorship. 
By this means, gaps in program or need for multidiscipline 
approaches are readily detected. The identity and biograph- 
ical information regarding individuals and institutions 
engaged. in research is given only to granting agencies. 

Committees planning symposia and surveys of research 
may use BSIE to find out what is being done, by whom, 
and where. 

The Exchange limits the use of its data to investigators 
associated with recognized research institutions. Copies 
of the pertinent ‘Notices of Research Project’’ are sent in 
answer to most questions which puts investigators interested 
in related and similar research problems in touch with each 
other. 

General restrictions regarding the use of BSIE index infor- 
mation include an overall prohibition against the use of the 
data for publication or publication reference. BSIE also 
takes cognizance of any additional restrictions which may 
be placed on data contributed from granting agencies. 
No fiscal data are sent to individual investigators. 


No attempt is made to coordinate the BSIE index data 
with subsequent scientific reports. The index does not re- 
cord results or documentation. Knowledge of active or 
completed research projects and of names of investigators 
are depended upon to give pertinent leads to any reports 
based on these projects. 

The BSIE as a clearinghouse welcomes the participation 
and the cooperation of any agencies who support research 
in the life sciences. At the present time, it is the only 
central source for such information in the United States. It 
includes research being undertaken abroad whenever the work 
is supported by a cooperating agency. 

Plans to improve service by expanded data storage 
facilities and prompt retrieval are now being implemented 
by transfer of information from punched cards to magnetic 
tape. Data processing will be accomplished with a Bur- 
roughs 205 machine which was obtained after considerable 
study of the needs and finances of the BSIE. It is expected 
that all the current data will be translated by January 1960. 

Current size of budget and staff of BSIE makes it necessary 
to sharply limit rephes to general requests from the inter- 
ested public for comprehensive analyses and _ reviews. 
BSIE does provide such services to its cooperating agencies. 


Status of registration of extramural and intramural Government research 


Also, in order to clearly depict the scope of BSIE operations, 
Dr. Stella Leche Deignan, Director, at the request of the subcommittee, 
submitted a list showing the extent and status of registration with 





ewes 





DOCUMENTATION OF SCIENTIFIC INFORMATION 59 


BSIE of extramural and intramural Government research, which 
follows: 


AIR FORCE 


Arctic Aeromedical Laboratory 
(Registers contracts only; latest registration May 
1958.) 
Aeromedical Laboratory, Air Materiel Command 


(Registers contracts and intramural research; latest 
_ _ registration October 1958.) 
School of Aviation Medicine 


(Registers contracts and intramural research; up to 
date.) 


Office of Scientific Research (including Brussels Office) 
(Registers contracts only; up to date.) 


DEPARTMENT OF THE ARMY 


TAGO, Personnel Research and Procedures Division, Per- 
sonnel Research Branch. 

(Registers intramural research only; up to date.) 
Medical Research and Development Command 

(Registers contracts only; up to date.) 
Human Resources Research Office 

(Registers contracts and intramural research; latest 

registration November 1958.) 


No registration of research with the BSIE 


Chemical Corps 


eee Corps 

Walter Reed Medical Center 

Fitzsimons General Hospital, Denver, Colo. 

Medical Nutrition Laboratory, Chicago, Ill. 

Army Medical Research Laboratory, Fort Knox, Ky. 
Brooke Army Medical Center, Fort Sam Houston, 


DEPARTMENT OF THE NAVY 


Office of Naval Research 
Biological Sciences Division 

(Registers contracts only; up to date.) 
Psychological Sciences Division 
(Registers contracts only; up to date.) 
Nav al R Research Laboratory (Bellevue) Washington, 

DC. 
(Registers intramural research only; latest registration 
July 1958.) 

U.S. Naval Training Device Center, Long Island, N.Y. 


(Registers intramural research only; latest registra- 
tions July 1956.) 








60 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Bureau of Medicine and Surgery (registers intramural research 
only) 
The following installations have registered research with 
the BSIE: 
U.S. Naval Hospital, National Naval Medical Center, 
Bethesda, Md. 
(Latest registration January 1959.) 
U.S. Naval Medical School, National Naval Medical Center, 
Bethesda, Md. 
(Latest registration December 1957.) 
U.S. Naval Dental School, National Naval Medical Center, 
Bethesda, Md. 
(Latest registration November 1959.) 
U.S. Naval Medical Research Institute, National Naval 
Medical Center, Bethesda, Md. 
(Latest registration October 1959.) 
U.S. Naval Hospital, Camp Pendleton, Calif. 
(Latest registration January 1959.) 
U.S. Navy Prosthetic Research Laboratory, U.S. Naval 
Hospital, Oakland, Calif. 
(Latest registration June 1957.) 
— Investigation Center, U.S. Naval Hospital, Oakland, 
Calif. 
(Latest registration January 1959.) 
U.S. Navy Preventive Medicine Unit No. 5, U.S. Naval 
Hospital, San Diego, Calif, 
(Latest registration May 1958.) 
U.S. Naval Training Center, San Diego, Calif. 
(Latest registration July 1959.) 
Dental Department, Administrative Command, U.S. Naval 
Training Center, San Diego, Calif. 
(Latest registration July 1957.) 
U.S. Naval Radiological Defense Laboratory, San Fran- 
cisco, Calif. 
(Latest registration October 1957.) 
Neuropsychiatric Unit, Medical Department, Marine Corps 
Recruit Depot, San Diego, Calif. 
(Latest registration June 1957.) 
U.S. Naval Medical Research Laboratory, U.S. Naval Sub- 
marine Base, New London, Conn. 
(Latest registration November 1959.) 
U.S. Navy Preventive Medicine Unit No. 1, Jacksonville, 
Fla. 
(Latest registration June 1957.) 
U.S. Navy Mine Defense Laboratory, Panama City, Fla. 
(Latest registration September 1957.) 
U.S. Naval School of Aviation Medicine, U.S. Naval Air 
Station, Pensacola, Fla. 
(Latest registration June 1959.) 
Dental Department, Administrative Command, U.S. Naval 
Training Center, Great Lakes, Il. 
(Latest registration August 1957.) 


A TT TNS SE TACT IR EN ERE AT 





et TOES PE TT EI 


ERR FOREN a a 


Ae TI RRR R RT 


ns 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


U.S. Naval Medical Research Unit No. 4, U.S. Naval Train- 

ing Center, Great Lakes, Ill. 
(Latest registration June 1959.) 
Dental Department, Administrative Command, U.S. Naval 
Training Center, Bainbridge, Md. 

(Latest registration August 1957.) 

U.S. Naval Hospital, Chelsea, Mass. 
(Latest registration April 1959.) 

U.S. Naval Retraining Command, Portsmouth, N.H. 
(Latest registration October 1958.) 

U.S. Naval Hospital, St. Albans, Long Island, N.Y. 
(Latest registration October 1958.) 

U.S. Naval Medical Field Research Laboratory, Marine 
Barracks, Camp Lejune, N.C. 

(Latest registration April 1959.) 

Air Crew Equipment Laboratory, Naval Air Experimental 
Station, U.S. Naval Air Material Center, U.S. Naval 
Base, Philadelphia, Pa. 

(Latest registration October 1958.) 

Aviation Medical Acceleration Laboratory, Naval Air De- 

velopment Center, Johnsville, Pa. 
(Latest registration June 1959.) 
U.S. Naval Hospital, 17th and Pattison Avenue, Philadel- 
phia, Pa. 
(Latest registration June 1959.) 
U.S. Naval Hospital, Portsmouth, Va. 
(Latest registration June 1959.) 
U.S. Naval Medical Research Unit No. 3, Cairo, Egypt. 
(Latest registration September 1959.) 
The following installations have not registered research 
with the BSIE: 

U.S. Naval Medical Research Unit No. 1, University of 
California, Berkeley, Calif. 

U.S. Naval Hospital, Great Lakes, Til. 

U.S. Naval Unit, Army Chemical Center, Md. 

U.S. Naval Unit, Fort Detrick, Frederick Md. 

U.S. Naval Air Test Center, Air Medical Branch of Service 
Test, Patuxent River, Md. 

U.S. Naval Hospital, Camp Lejune, N.C. 

US. a Hospital, Navy No. 3923, FPO, San Francisco, 
Calif. 

U.S. Naval Medical Research Unit No. 2, Taipeh. 

U.S. Naval Station, Navy No. 188, FPO, New York, N.Y. 


ATOMIC ENERGY COMMISSION 
(Registers contracts only; up to date.) 
VETERANS’ ADMINISTRATION 


(Registers intramural research in all VA hospitals; 
up to date.) 


54122—60——_5 


61 








62 DOCUMENTATION OF SCIENTIFIC INFORMATION 


NATIONAL SCIENCE FOUNDATION 


(Registers contracts only; up to date.) 


DEPARTMENT OF HEALTH, EDUCATION, AND WELFARE 


Public Health Service 
National Institutes of Health 


(Register extramural research only; up to date.) 
Air Pollution Medical Program 


(Registers extramural and intramural research; latest 
registration January 1958.) 
Division of Nursing Resources 
(Registers intramural research only ; latest registration 
July 1958.) 
Office of Education 
(Registers extramural research only; up to date.) 
Office of Vocational Rehabilitation 


(Registers extramural research only; up to date.) 


DEPARTMENT OF AGRICULTURE 


(No registration of research with the BSIE.) 


DEPARTMENT OF THE INTERIOR 


Fish and Wildlife Service 
(No registration of research with the BSJE.) 


Further details relative to the operations and program of the BSIE 
and the information exchange services being established in the physical 
sciences are included in the report of the National Science Foundation 
(pp. 142-144), and in the forefront of this report (pp. 13-15). 


CrnTRAL INTELLIGENCE AGENCY (CIA) 


As set forth in the staff summary of this report (p. 16), the CIA has 
been most cooperative in presenting full details valeaivs to its opera- 
tions in the field of science information and data processing. In 
response to a request of the staff, the following comments which set 


forth a brief outline of these activities, were forwarded for inclusion in 
the report: 


The cycle of organizational activity for intelligence pur- 
poses extends from the collection of selected information to 
its direct use in reports prepared for policymakers. Between 
these beginning and wi activities there lie a number of func- 


tions which can be grouped under the term information 
processing. These functions include the identification, re- 
cording, organization, storage, recall, conversion into more 
useful forms, synthesis, and dissemination of the intellectual 
content of the information collected, The ever-mountin 

volume of information produced and promptly wanted an 

the high cost of performing these manifold operations are 





~~ rei 


ccemenas SAT I LL AS A EET ELITE ITS ITS I 
PR REE ORS ER FOIE AE SA OARS OTS 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


— a critical review of current practices in the processing 
eld. 

To cope with its information handling problems, the 
Central Intelligence Agency has over the past 13 years 
developed an information processing center which com- 
prehensively indexes and stores that information which is 
collected and, as a service of common concern, renders daily 
support to analysts at work in all parts of the U.S. Govern- 
ment’s intelligence community. This central reference serv- 
ice organizationally consists of a central library of books and 
documents, specialized libraries or “registers’’ concerned 
with biographic, graphic and industrial information, a 
document center to which and from which the very exten- 
sive documentary flow comes, and a machine unit which 
acts as a nucleus supporting the office through controlled 
te of data by machine methods. 

Efficient and economical storage and retrieval of informa- 
tion is by all odds the toughest of the information processing 
problems, millions are being spent on it by the research librar- 
ies of universities, of industry, and of government. For us 
this problem is particularly vexing since our document center 
alone receives thousands of different intelligence documents 
each week in numbers of copies running into the tens of 
thousands. This is exclusive of newspapers, press summaries, 
books, maps, and other such open material which is acquired 
by the library in an average of 200,000 pieces permonth. The 
open literature is obtained to meet the needs of our own 
analysts or those of 20 other U.S. Government agencies; that 
which is filed centrally in our library is handled as it would 
be in a conventional library, using the Library of Congress 
classification system. 

The classified documents are received from scores of differ- 
ent major sources, the daily volume fluctuates and lacks uni- 
formity in format, in reproduction media, in length and qual- 
ity of presentation, and in security classification. As they 
come in they must be read with an eye to identifying material 
of interest to the many different customer receipt points ; those 
which have future retrieval value (approximately 50 percent) 
must be indexed and stored in such manner as to provide re- 
trieval pertinent to customer needs. This material is subject 
to control through IBM-punched cards. 

These IBM card files now contain over 40 million ecards. 
Since 1954 we have been miniaturizing the documents by 
microphotography and mounting them in apertures on IBM 
punched cards. Access to the document itself is indirect, 
through codes punched into the cards to indicate subject, 
area, source, security classification, date and number of each 
document. The data on cards retrieved in response to a par- 
ticular request is reproduced by photographic means in tape 
form and constitutes the bibliography given the customer. 
This system, which we instituted in 1947 (using microfilm 
strips rather than aperture cards prior to 1954 and faesimile 
rather than photo tape prior to 1959), we call the Intellofax 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


system; it represents pioneer work in the field of information 
storage and retrieval. 

Foreign scientific information is a part of our total volume 
acquisitions and is important to us in the discharge of some 
of CIA’s direct responsibilities to the National Security 
Council. We are also called upon to perform certain services 
of common concern for the intelligence community. The 
problem we have concerning scientific information is four- 
fold: (1) knowing which scientific publications we need and 
which are available; (2) acquiring these as promptly as 
possible; (3) disseminating them or the information in them 
in a form and in a language facilitating their use; (4) organiz- 
ing the information in such manner as to permit its rapid 
recall when needed. 

Generally speaking, it is possible to obtain copies of all sig- 
nificant scientific publications published abroad; exceptions 
are those coming under foreign security classifications and 
those unclassified publications whose distribution is restricted 
by Communist bloc countries. The acquisition of doctoral 
dissertations also presents difficulties. Procurement of the 
desired publications is undertaken by the Foreign Service of 
the United States with community needs coordinated through 
an interagency committee of the U.S. Intelligence Board. 
This committee includes, in addition to intelligence com- 
munity representatives, members from the three national 
libraries (Library of Congress, Nationa! Library of Medicine, 
and the Department of Agriculture Library), the National 
Science Foundation and the U.S. Information Agency. 
Through this mechanism the needs of all are known, tasks 
can be allocated, and performance evaluated. 

We are often asked whether we make these publications 
available to the public. The answer is that we do. Single- 
copy material which we retain is given to the Library of 
Congress for microfilming for its collection; duplicate copies 
are passed on for direct use. Material which has been ex- 
ploited by our research personnel and no longer needed is 
redistributed and thenceforth available to users of the 
national libraries. Of the total publications we purchase or 
secure abroad, 98 percent is made available to the public 
through our cooperative arrangements with the national 
libraries. 

The Central Intelligence Agency contributes regularly 
material which is incorporated into the national libraries’ 
published indexes: the Monthly List of Russian Accessions, 
the East European Accessions Index, the Bibliography of 
Agriculture, and the Current List of Medical Literature. 
Similarly CIA maintains a continuous program of reporting 
selective extracts from scientific and technical literature. 
This information is carried in a semimonthly publication, the 
Scientific Information Report, which is made available to, 
and reproduced by, the Office of Technical Services, Depart- 
ment of Commerce, who offers it to U.S. science and industry 
on subscription 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


To meet our internal needs, we require considerable trans- 
lation work. Soviet publication in the field of science and 
technology has been estimated in excess of 700 million words 
per year. Presently, only a small portion of this volume— 
some 7 percent—is being translated. In earlier years fewer 

ublications were available and a great deal less translated. 
his naturally led us to the possibilities of machine trans- 
lation, and since 1952 CIA has been promoting machine 
translation research. Since 1956, we have provided finan- 
cial support to the machine translation program of the In- 
stitute of Language and Linguistics, Georgetown University. 

The major machine translation emphasis to date has been 
from Russian to English in the field of chemistry, and from 
French to English in nuclear physics. A capability usin 
general-purpose computers now exists for production o 
usable text (requiring some postediting) at rates in excess 
of 30,000 words per Bae: depending on which computer is 
used. Costs are comparable to human translation and, of 
course, speed is much greater, a human translator produc- 
ing about 2,600 words per day. Present problems involve 
input to the computer to keep it working at optimum since 
today the input must be hand-punched, and a key-punch 
operator prepares machine input at the rate of 6,000—7,000 
words per day. Optimum computer use therefore requires 
large key-punch staffs. 

he solution to this problem is a character-sensing device 
which can feed the foreign language directly tata the com- 
puter. CIA interest in mechanical translation, character- 
sensing research, and other applications of mechanical or 
electronic equipment to information processing is coordinated 
with other intelligence services through saeieee interagency 
committee of the U.S. Intelligence Board. And again, this 
group is tied in with the nonintelligence community by in- 
cluding a representative from the National Bureau of Stand- 
ards who is cognizant of work being done in electronic data 
processing research by and for many Government agencies, 
private research organizations, and commercial firms. 

Currently we are using microphotography and our Intello- 
fax system as previously described. e are also experi- 
menting with Minicard equipment manufactured by Eastman 
Kodak and initially developed for the Air Force. A special 
large-scale application has been developed by IBM referred 
to as the WALNUT system, details of which are available 
from the company. Other components of the intelligence 
community have sponsored important developmental work 
in automatic abstracting and automatic dissemination. 

Trends in the field of information processing would appear 
to support the following conclusions: (1) Channels for pro- 
curing publications and techniques for storing the physical 
document are extensive and well developed; the outlook is 
for expansion and intensification of present methods; (2) the 
type of information service coming into being will demand 
action primarily in preparing reference personnel to give 
assistance of higher quality than is given today; (3) special- 


65 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


ized schemes will be developed to fit the needs of specialized 
users where solutions to problems call for a search of the 
literature; and (4) the present and future demands for 
reference services will lead to increased use of machines 
where these can be introduced without jeopardizing the per- 
formance of essential intellectual operations. 


DEPARTMENT OF COMMERCE 


In response to the committee’s request, Mr. Philip A. Ray, Under 
Secretary of Commerce, submitted, on May 9, 1960, summaries of 
operations and detailed supporting data relative to the activities of 
the Department of Commerce in the field of science information 
processing. In addition to the summaries of these activities, which 
follow, there are numerous references throughout this report relative 
to the activities and contributions made by the National Bureau of 
Standards in the development of the information retrieval programs 
of,other agencies. 


NATIONAL BUREAU OF STANDARDS 


The activities of the National Bureau of Standards in the 
areas of assembling, coordinating, disseminating, selecting, 
and retrieving technical data and literature may be divided 
into the following principal categories: 

1. Research and general advisory services. 

2. Data services. 

3. Equipment developments. 

4. NBS publications. 

Representative activities in each of these categories are 
discussed below. 

1. Research and advisory services.—Since the early days of 
the development of electronic computer technology, the Data 
Processing Systems Division of the Bureau has conducted 
research and development in the broad area of automatic 
information processing, and has served the other agencies of 
Government as. technical adviser on the application of elec- 
tronic computing techniques to a wide variety of data process- 
ing problems. This service has extended from technical 
supervision of the contracts to develop the first three Univacs 
(for Census, the Office of Air Comptroller, and the Army Map 
Service), to the evaluation of system proposals for mechani- 
zation of important operations in intelligence data processing 
for the Department of Defense. 

During the past 10 years, such services have been ren- 
dered to a number of other Government agencies, of which 
the following examples are representative but by no means 
comprehensive: 

Department of Defense: 
Air Force: Air Materiel Command. 
Army: 
Chemical Corps. 
Signal Corps. 
Corps of Engineers. 





Ng AO I CEPT TE IED 


‘ AA OEE IT ELLE OTOL SE 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Navy: 
Bureau of Supplies and Accounts. 
Bureau of Ships. 
Special Devices Center. 
Department of Commerce: 
U.S. Weather Bureau. 
U.S. Patent Office 
Public Roads Administration. 
Department of the Treasury: 
Savings Bond Division. 
Check Reconciliation. 
Post Office Department. 
Department of Health, Education, and Welfare: Social 
Security Administration. 
Federal Aviation Administration. 
Federal Communications Commission. 
Federal Reserve Board. 
General Services Administration. 
Home Loan Bank Board. 
National Science Foundation. 
Public Housing Administration. 

In addition to its general interests in data processing appli- 
cations, the Division has been active in the field of potential 
mechanization of information selection and retrieval opera- 
tions. With respect to these activities, the Special Advisor 
Committee of the National Academy of Scancen whic 
studied the NBS programs stated: “The Bureau has been, 
and is now, the established and natural fulcrum for a balanced 
Federal effort in research relating to information systems.”’ 

For the past 4 years, the Data Processing Systems Division 
has been engaged in a cooperative program in the U.S. 
Patent Office, looking toward the mechanization of patent 
search operations. Both short- and long-range research have 
been vigorously pursued. Specific contributions by the 
Bureau to this program have included demonstration of an 
element-by-element topological structure search, wherein a 
specified chemical structure or substructure can be located 
and recognized, however embedded it may be in more com- 
plex structures. Studies and machine experiments have 
been conducted to check, sort, assemble, and edit input data 
and file data (that is, encoded patent information) for errors 
and inconsistencies. These computer studies have been di- 
rected primarily to possibilities for pilot production of patent 
search in selected chemical arts. 

Concurrently, broader and more comprehensive search 
programs for the chemical arts have been developed. On 
the long-range side, research has proceeded in machine scan- 
ning and processing of pictorial and graphic data, such as 
chemical structure and electrical circuit diagrams, maps, and 
photographs. The Bureau has promoted related basic re- 
search in techniques of linguistics, logic, and mathematics 
that may prove applicable to improved systems for informa- 
tion selection and retrieval. Examples include experiments 
in machine analysis of the syntactic structure of English 


67 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


sentences, and contracts for research in the analysis of natural 
and artificial-languages and of possible applications of a lat- 
tice theory to the possibilities of a general theory of infor- 
mation storage, selection, and retrieval. 

Other continuing work includes study of possibilities for 
the interaction of man and machine during search, so that 
search prescriptions may be reformulated in accordance with 
results obtained. For example, a machine program has been 
demonstrated which searches a sample file in a specially in- 
structed manner, reviews the results of the search, and sug- 
gests modifications of the original program. 

The Data Processing Systems Division is also cooperating 
with the National Science Foundation in the establishment 
of the Research Information Center and Advisory Service 
on Information Processing. This cooperation has as its 
objectives: 

(1) The providing of information on research and 
development activities in the field of automatic process- 
ing of information that is expressed in natural languages 
or in diagrams or other graphic forms. 

(2) The providing of advice and assistance on the 
use of machines for scientific literature storage and 
search and related applications. 

(3) The promotion of closer working relationships be- 
tween research workers active in the field. 

m “) The conduct of basic and applied research in the 
eld. 
This new service is part of a broad program being developed 


by the National Science Foundation to improve the quality 
of scientific information services and thus to shorten the time 
that scientists must spend in searching literature. 

The Bureau’s activities in this cooperative activity include 
the exploration of sources of information on current and 
proposed research and development activities, the analysis 
of these developments nae from time to time the 


preparation of reports of research progress, and of state-of- 
the-art reviews, the experimental intercomparison of various 
techniques for automatic data and document selection, and 
the furnishing of advice on machine capabilities. A refer- 
ence file is being established on current projects, workers in 
the field, and available reports and publications, This file 
itself may also be used for continuimg experimentation in 
the use of machines as aids in indexing, search, and selection. 

A second important program related to the storage and 
retrieval of research information, especially in the field of 
instrumentation is underway in the NBS Office of Basic 
Instrumentation (OBI). This Office has done pioneerin 
work in the development of the Peek-a-Boo system, include 
related equipment, which stores vast amounts of informa- 
tion in miniaturized form and provides a relatively simple 
basis for operating such systems for small and medium-sized 
data collections. The system is now in active use to store 
information on instrumentation which becomes the center of 
OBI’s responsibility to serve as a clearinghouse in instru- 
mentation research. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


2. Data services.—As a special and related category, the 
Bureau actively maintains a number of small-scale data 
centers which provide highly technical information in spe- 
cialized disciplines. These centers are closely related to the 
Bureau’s research programs and interests and stem from the 
Bureau’s responsibility to provide data on the properties of 
materials when they are not available with sufficient accuracy 
elsewhere. Examples of this type of activity are: Tube 
Information Service; Cryogenic Data Center; Tisctten De- 
vices Data Service; Thermodynamics Data Section; and the 
“Catalog of Atomic Spectra.” These centers are main- 
tained by capable scientists with a very specialized interest 
and the service is designed for workers in the field of interest. 

3. Equipment developments.—Some of the major and funda- 
mental advances in the machinery related to the high-speed 
handling of data and information result from Bureau research 
programs. These developments are often based upon co- 
operative programs with other Government agencies who 
have a specific need for a special-purpose development to 
meet an operating problem. Examples of this activity are 
the ‘‘Rapid Selector’ which has been adapted to meet the 
need of the Navy’s Bureau of Ships in handling massive re- 
ports, and technical information; Fosdic, the fundamental 
film-sensitive device which stores vast collections of informa- 
tion for use by the Weather Bureau and the Bureau of the 
Census. The literature furnished to the committee provides 
information on a number of these developments. It should 
also be noted that the Bureau has contributed substantially 
to the computer developments and to computer componentry. 

4. NBS publications —The major output of NBS is scien- 
tific information, tables of technical data, and papers describ- 
ing the results of Bureau research. The major publication of 
the Bureau is the Journal of Research which has earned for 
itself a fine record of contribution to the advance of science 
and technology. The journal is divided into four major sec- 
tions dealing with physics and chemistry; mathematics and 
mathematical physics; engineering and instrumentation ; and 
radio propagation. 

In addition to this major periodical, the Bureau issues a 
monthly periodical called the Technical News Bulletin. 
This is a digest publication designed for technical and trade 
publications, research management, and technologists. It 
summarizes and highlights the current work of the Bureau’s 
scientists. The Bureau also issues a number of nonperiod- 
icals on subjects resulting from Bureau programs such as the 
publications in the applied mathematics series, principally 
mathematical tables of interest to applied scientists and 
engineers, handbooks, giving data on safe practices, and 
monographs and circulars on various subjects. Finally the 
Bureau’s Boulder Laboratories issues regular basic radio 
predictions which, very much like weather forecasts, predict 
atmospheric conditions for radio transmission. 

Most of our information is published not in Government 
publications but in the publications of professional societies. 


69 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


We encourage this as the best means for disseminating our 
information and getting our results into use. Last year we 
published 264 articles in Bureau publications and 388 articles 
in professional scientific journals. 


BUREAU OF PUBLIC ROADS 


The Bureau of Public Roads first began to explore the 
application of electronic computers to highway engineering 
and administrative problems in 1955 as a means of increasing 
engineering productivity to meet the demands of the greatly 
expanded highway programs then contemplated. This 
study was directed toward the operations of the Bureau of 
Public Roads as well as the State highway departments in 
connection with their Federal-aid highway systems for which 
the Federal Government reimburses from 50 to 90 percent 
of the cost. Capital expenditures on the Federal-aid systems 
were about $534 billion last year. 

From these initial studies there has developed a major 
evolution in highway engineering methodology from which 
substantial savings in money, manpower, time and more 
effective quality control are now being realized. The con- 
tinuing work of the Bureau in this area, which is centered in 
the Division of Development, includes the development of 
specific applications of electronic equipment in highway 
engineering and administration either independently or in 
cooperation with other offices of the Bureau, State highway 
departments, educational institutions and others; the pro- 
motion of the utilization of such equipment in highway 
operations; assistance to State highway departments in 
establishing or strengthening their electronic computer opera- 
tions or in specific applications in engineering or administra- 
tion; and the establishment and operation of a national 
library of electronic computer programs developed for use 
in the highway field. There are now about 700 electronic 
computer programs in use including traverse computation 
and adjustment, earthwork quantities, horizontal and 
vertical alinement, roadway design, interchange geometrics, 
bridge geometric design, composite beam analysis, continuous 
beam bridge design, prestressed girder design, slope stability 
analysis, bid snokoean traffic forecasting, traffic assignment, 
speed check analysis and many others. A copy of our most 
recent periodic listing of library programs, ‘Electronic Com- 

uter Program Library Memorandum No. 7,” has been 
urnished to the committee. 

Very substantial progress has been made. At the present 
time, 41 State highway departments have electronic com- 
puters installed and are using them in their day-to-day opera- 
tions. The Bureau of Public Roads has two computers, 
one in the Washington office and one in the regional office 
at Portland, Oreg. 

While we have not found need for the literature search type 
of information processing, we do have applications which 
require data storage and retrieval as a basic element. For 





AN NR I I ES A IE ETE A I LS 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


example, in determining the most economical highway loca- 
tions between designated terminal points, costs of alternative 
routes are developed for comparison. By using electronic 
computers and aerial photographs in combination, we have 
been able to make semiautomatic read-offs from aerial photo- 
graphs and make more exhaustive analyses of alternative 
routes in less time than was previously required for a very 
limited analyses. In this process, the spatial coordinates of 
a very large number of points on the surface of the ground 
are obtained from aerial photographs mounted in precise 
stereoplotting instruments, automatically punched into cards 
or tape and then stored in the memory of an electronic 
computer. Controlling dimensional data for alternative 
horizontal and vertical alinements are then entered into the 
computer. The computer automatically retrieves from its 
memory the appropriate terrain points for each alternative 
alinement and produces earthwork quantities. The same 
principal is now being applied to the development of right- 
of-way costs. Ultimately these and other programs will be 
combined to make possible a complete economic analysis. 
This work is being done at the Massachusetts Institute of 
Technology as a cooperative development project with the 
Massachusetts Department of Public Works and the Bureau 
of Public Roads. 

In another application, developed in the Bureau of Public 
Roads, large volumes of travel data for a metropolitan area 
are analyzed to determine current travel desire lines. This 
data is then projected into the future to estimate travel 
volumes which will have to be provided for at any specific 
future date. A further analyses is then made to determine 
the street and freeway system best fitted to carry these 
estimated volumes for the year 1975 or any future year. 
This forms the basis for long-range transportation planning 
for the area. This series of analyses is feasible only on a 
large storage capacity, high-speed, electronic computer. 
Masses of data must be stored and retrieved during each 
phase of the process. 

Still another application, also developed in the Bureau of 
Public Roads, requires the storage and retrieval of large 
volumes of cost and performance data for large fleets of high- 
way construction and maintenance equipment. This data 
is analyzed to obtain unit costs and comparative performance 
data for the many types and classes of equipment involved. 

In any type of engineering or scientific analysis involving 
large quantities of experimental or observed data, such as 
the problems described, data storage and retrieval is of 
course, an essential part of the analysis. We have the same 
situation in accident studies, parking studies, and other 
analyses of a statistical nature. 

As a matter of further interest, we are also investigating 
applications of nuclear science in the highway field particu- 
lari as a means of quality control. For example, using a 
nuclear device it is now possible to determine density and 
moisture content of embankment lifts in a matter of min- 


71 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


utes with no interference with the compacting equipment. 
Another device utilizes infrared rays to check paints for con- 
formance to standards. Methods and devices have also 
been developed for the nondestructive testing of welds and 
castings using ultraviolet rays. Still other applications 
are in the exploratory stage. 


COAST AND GEODETIC SURVEY 


The Coast and Geodetic Survey uses an IBM 650 mag- 
netic drum calculator for its scientific and engineering com- 
puting and also for its fiscal data processing. The punched 
cards used in fiscal operations are retained for the minimum 
period of time as required by GAO and then disposed of as 
surplus. Most of the cards used in the engineering data 
processing are disposed of when the computations have been 
completed. 

At the present time the only cards which are preserved 
are those involving magnetic observatory records of the 7 
observatories operated by the Bureau and the master file of 
150,000 worldwide field stations at which magnetic measure- 
ments have been made. This latter file represents the source 
material for the compilation of World Magnetic Charts. Re- 
trieval of any of this material—fiscal, engineering or scien- 
tific—is by means of standard punch card equipment. 

All original survey records and computations plus all 
survey sheets (topographic and hydrographic) are cataloged 
under classifications for subject matter and locality. These 
records can be easily found in either the Bureau or National 
Archives and the Federal Storage Center by means of this 
classification. This type of material is used by geodesists, 
photogrammetrists, oceanographers, geophysicists, cartogra- 
phers, and others in the normal activities of the Bureau. 

This same material is microfilmed for storage in selected 
depositories throughout the country. In case retrieval be- 
comes necessary, the material will be read directly from the 
films or from prints made from the films. 


WEATHER BUREAU 


For forecasting purposes weather data, observed all over 
the world, are a highly perishable commodity. They are 
collected in international code by the fastest available means 
of communication. In the United States these observations 
flow to the National Meteorological Center at Suitland, Md. 
A large part of the information, especially the upper air 
observations, are automatically edited and are subjected to 
an automatic accuracy check at very high speed by com- 
puter methods. A large digital electronic computer (IBM 
704) then performs an objective analysis of the data and 
prepares a numerical forecast of many important atmospheric 
parameters. This material forms the basis for complete 
analysis and forecast information for the Northern Hemi- 
sphere. Comprehensive forecast information is then broad- 





— A REE ELLE RTE LED 


rene 


PR RTT LL SEE AL SED TEL AT 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


cast in map form by facsimile to Weather Bureau and 
military meteorological field offices for guidance in their local 
forecasts. Extensive and continuous development work 
is underway to automate all phases of meteorological work. 
This includes the automatic weather observing system of 
the Bureau (Amos) which is just becoming operational on a 
larger scale. Jointly with the Air Force and the Federal 
Aviation Agency, the Bureau is developing an integrated 
weather observing, transmitting and forecast system for 
air traffic control and terminal weather conditions. 

Research in numerical prediction and studies of the general 
circulation of the atmosphere will make use of the more pow- 
erful computers which are soon to become available (IBM 
7090 type, in the immediate future, and stretch type, within 
a few years). The imtegration of an envisaged incessant 
flow of data from meteorological satellites into the present 
mechanized system will require exceptional data handling 
capacity for the National Meteorological Center. 

Machine methods have also been used for the past 3 years 
to prepare extended forecasts and monthly outlooks. A 
long history of automatic data processing, publishing, 
storage, and recall has been established by the Bureau 
Climatological Service. Since 1936 historical weather data 
have been placed on punchcards and used for climatic sum- 
marizations utilizing all types of punched card equipment 
including electronic computers. Since that time a library 
of about 350 million weather observations on punchcards 
has accumulated. This is the world’s greatest and most 
valuable collection of weather data. It serves not only the 
compilation of climatic summaries and the establishment of 
climatic risks but also is the raw material for much mete- 
orological research, including the testing of new methods of 
forecasting. 

This enormous collection is housed at the National 
Weather Records Center, Asheville, N.C., where it shares a 
building with the manuscript and autographie records of 
weather data in the United States. The National Archives 
has designated the Weather Bureau as custodian of all 
weather records. These require now 175,000 square feet of 
floorspace to store. The data are used jointly by the 
Weather Bureau and the military services. Data are made 
available at cost to private enterprise, universities, and often 
serve in litigation. 

With an influx of nearly 30 million punchcards per year 
and millions of weather records on forms and charts new 
systems of reduction and recall had to be devised to reduce 
storage bulk and increase operating efficiency. In coopera- 
tion with the Bureau of Standards and the Bureau of the 
Census a filmer and corresponding automatic electronic 
film reader (Fosdic PCI system) was designed and placed 
in operation. Cards are microfilmed at the rate of 420 per 
minute with 13,000 card images on each 100-foot roll of 16- 
millimeter film and with a storage space reduction of 180:1. 
Search for data on the film can be carried on at the rate of 


73 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


4,300 card images per minute. Present capacity can keep 
up with the anticipated influx of data. Processing of the 
film directly through electronic computer for statistical 
compilations and other applications is far along in 
development. 

Manuscript records are reduced to microform by a system 
designed by a cooperative project of the Weather Bureau 
and the Bureau of the Census. Mimic (multiple-image 
microcopy camera) reduces on 70-millimeter film up to 45 
documents on each 5-inch film strip. Positive copies are 
readily readable in suitable readers and additional copies can 
be produced quickly and cheaply. 

All current climatological observations from over 11,000 
stations in the United States are punched on cards, subjected 
to automatic quality control check, and summarized for pub- 
lication by machine methods at three weather records proc- 
essing centers (San Francisco, Calif.; Kansas City, Mo.; 
Chattanooga, Tenn.). In total about 25,000 pages of cli- 
matic data produced by machine methods are made avail- 
able to scientists, engineers, business and farm operators per 
year. Most of the published data are in hands of the users 7 
weeks after the end of the period for which the observations 
were made. As more powerful tabulators become available 
on a commercial basis they will be installed at these centers. 

A special challenge on storage and recall of data is pre- 
sented by weather satellites of the Tiros] type. A minimum 
of 4,000 pictures per week have to be indexed, stored, and be 
available for selection for future research. Several systems 
analogous to those developed by the Air Force for aerial 
photograph storage and recall, have been under consideration 
for this purpose. As soon as funds become available a proto- 
type system will be installed for this material. Similar prob- 
lems also arise in the storage and utilization of weather radar- 
scope photographs. These begin to accumulate in sizable 
quantities. Electronic-optical scanning for these is envis- 
aged as a procedure for climatological analyses. Basic de- 
signs are available. 

The Weather Bureau has unique experience in the field of 
large-scale data processing and has been able to build up a 
staff of highly competent personnel in this field. They are 
frequently consulted nationally and have acted as advisers 
in the international field, in particular to members of the 
World Meteorological Organization. The National Meteor- 
ological Center and the National Weather Records Center 
have a continuous stream of visitors and students from all 
parts of the world. Systems and procedures developed there 
have served as models for installations in other countries. 


U.S. PATENT OFFICE 


As set forth elsewhere in this report (p. 1), the staff consulted 
with Mr. Don D. Andrews, Director, Office of Research and De- 
velopment, of the Patent Office. Following the demonstration of the 
systems and mechanical equipment now in use by that office, the 





2 RR ae 
gE AITO ED LI 


“eR AAE REDO ID EECA SESE C ON LEO OTL L LE LIES ie 


DOCUMENTATION OF SCIENTIFIC INFORMATION 75 


Director was requested to prepare a brief nontechnical statement of 
the origins, work, and future plans for the development of an ade- 
quate information processing and retrieval system within the agency, 
which was submitted to the committee staff on March 15, 1960. 
The summary follows: 


Origin, work, and future plans 

In its mission of issuing patents for inventions, the U.S. 
Patent Office is required by law to reject ae ae for 
patents which do not, inter alia, claim novelty. The Office 
has a staff of professional] scientists, each of whom examines 
the applications in a small] sector of science and technology 
(the so-called prior art). The first step in the examination 
is a search of the prior art to locate any document, or com- 
bination of documents, which discloses any subject matter 
from which the examiner can evaluate the question of novelty. 
The documents so obtained are known as references. f 
course, the other requirements in addition to novelty which 
govern issuance are also considered during examination. 
The examining corps spends approximately 60 percent of its 
time merely searching (as opposed to the remaining steps in- 
cluding evaluating the retrieved references) in what is one 
of the largest information retrieval operations in existence. 

In 1954, in its consideration of a Department of Commerce 
budget request, the U.S. Senate Appropriations Committee 
directed that the Department “* * * make an aggressive, 
thorough investigation as to the possibility of mechanizing 
the searching operations * * * in order to modernize, in so 
far as practical, the Patent Office operations.” The Secre- 
tary of Commerce appointed an Advisory Committee, which 
recommended in part: 

“* * * A research and development unit should be estab- 
lished in the Patent Office * * The National Bureau of 
Standards and the Patent Office should undertake a joint pro- 
gram to stimulate and develop * * * techniques specifically 
adapted to the Patent Office operations * * *.” 

On January 31, 1955, a small unit was formed within the 
classification operations of the Patent Office, and on April 
8, 1956, this matured into the Office of Research and Develop- 
ment, directly under the Office of the Commissioner of 
Patents. This Office and the National Bureau of Standards, 
jointly, separately and with outside contractors (including 
the Bureau of the Census) are engaged in a series of pro- 
grams which range from actual productive mechanized 
searching of a limited group of chemical compounds to ex- 
ploratory research in information retrieval, mathematics, 
and linguistics. 

* * * * * 

On August 3, 1957, Mechanized Examining Division A was 
created to handle the automated search of the first sector of 
the prior art which had been prepared for search mechaniza- 
tion, consisting of a collection of patents directed to a species 
of organic chemical compounds known as steroids. This 





76 


In support of the need to coordinate the patent policies partic- 
ularly as they relate to Government research contracts and to the 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


search for the elements of a compound structure, has been 
repeated with a different system for another sector of the 
chemical arts known as phosphate acid esters. A more 
sophisticated system was designed for a third sector of the 
chemical arts, that of polymers, and is being test-searched not 
only for the compound per se, but for the processes of making 
the compound, the physical conditions of the process, and the 
uses of the compound. A system of topographic representa- 
tion of chemical compounds has been postulated and tested, 
and the applicability of this theory is being considered in 
relation to patents in both the electrical and mechanical arts. 
Concomitantly, another, and more sophisticated system of 
retrieval for the electrical arts is being postulated for test. 

On a longer range project, a universal system of retrieval 
has been postulated, and is being prepared for test in the field 
of carbon chemistry. This system, known as HAYSTAQ, 
is a joint project with the National Bureau of Standards 
and is being programed for their computer, SEAC. 

Exploratory research in linguistics has been proceeding in 
this Office, in the Bureau of Standards, and on contract by 
the Bureau to the Massachusetts Institute of Technology. 
The ultimate solution to many retrieval problems is conceded 
by all to lie, in part, in the still unexplored areas of the field 
of linguistics. 

This office has designed changes in existing punched card 
equipment, some of which is being used in productive search- 
ing. For purposes of retrieval research, an unconventional 

unched card sorter, known as ILAS, was also conceived and 
davitoned here. The equipment construction and modifica- 
tions were done by the Bureau of the Census for the Office. 
The Office has also rented an IBM 305 RAMAC for research 
and testing purposes. 
* * * * * 


Any statement of future plans must be made with the 
knowledge that results of research and development now 
being undertaken may shorten, lengthen, or abruptly change 
the direction, of any of the several independent paths we are 
now pursuing. Additionally, reports of studies by outside 
groups which are momentarily expected, may change the 
emphasis or direction of our present activities. And, per- 
haps most important, the availability of manpower of proper 
competence can influence the rate of progress. 

With these factors in mind, we plan to expand mechaniza- 
tion in the chemical field to other sectors. We expect to 
provide increased emphasis to our research efforts on the 
problem of converting scientific descriptions into proper 
machine searchable forms at minimum costs and with lesser 
demands on human skills. If postulated systems test success- 
fully, we hope to have a pilot searching operation in the field 
of electrical transistor circuitry before the end of the calendar 
year. 








PRT ES 


we enacts ae oe 


SP PA EET EAE RCTS II RO 





DOCUMENTATION OF SCIENTIFIC INFORMATION 77 


stimulation and protection of the Government’s research program, 
the Subcommittee on Patents, Trademarks, and Copyrights, of the 
Senate Committee on the Judiciary, filed a report in the Senate on 
March 16, 1960, in which it was stated: 


While expenditures for research and development have 
continued to increase at a rapid rate, and will amount to more 
than $7.8 billion in the coming fiscal year, the patent right 
confusion among the various Government agencies which deal 
with research contracts has definitely raised the issue whether 
more uniformity is not necessary. 

Although the race for technological survival in which the 
Government is presently engaged depends in large part on 
the success of its basic research program, into which billions 
of dollars are being poured, there has been no effort to co- 
ordinate patent policies for the purpose of stimulating and 
protecting the program. According to the preliminary re- 
ports made by the subcommittee staff to date, the National 
Science Foundation, which is charged with this task, has so 
far failed to consider the adverse effect on basic research of 
patent clauses presently contained in its own research con- 
tracts. 

Data-processing programs of Department 

The Department of Commerce also submitted a number of reports, 
articles, and other descriptive materials relating to the automatic 
computers being utilized by the Bureau of the Census in the processing 
of statistical data, and other information relating to automation in the 
data processing programs of the Bureau. Inasmuch as the staff has 
attempted to avoid the inclusion of extensive detail on data processing 
operations as differentiated from scientific information retrieval, none 
of these publications have been incorporated as a part of this report. 

In addition, considerable information was submitted to the com- 
mittee relative to the activities of the Office of Technical Services. 
Many of the reports from other agencies included herein, have referred 
to their close association with, and an outline of, their operations 
involving functions of the Office of Technical Services, which acts as 
a distribution medium for such agencies. Therefore none of the 
printed material furnished to the committee on the Office of Technical 
Services has been included in this report. 


LIBRARY OF CONGRESS 


The staff has given considerable attention to the role and potential 
of the Library of Congress as a collection, storage, and dissemination 
center for scientific and technological materials, documents, and other 
related information. 

The Library of Congress is one of three U.S. Government libraries 
engaged in the performance of these functions, the other two being 
the National Library of Medicine and the Department of Agriculture 
Library. 

The activity of the Library of Congress in the scientific field began 
in 1866, when, by act of Congress, the Smithsonian Institution’s science 
collection was transferred to it. Since that time, the scientific collec- 
tion has grown steadily, reaching, by June 30, 1958, a total of 1,486,000 
scientific and technological volumes, including monographs and serials. 


54122—60——6 








78 DOCUMENTATION OF SCIENTIFIC INFORMATION 


The expansion has been continuing through materials drawn from a 
worldwide network of more than 18,000 exchange agreements with 
governments, private research centers, libraries, universities, and other 
scientific and technical institutions, and these sources provide annually 
nearly a half million books, pamphlets, journals, and other materials in 
printed, near-print, and photographic copy form. It is reported that 
the Library is now regularly exchanging publicatious with 151 scientific 
and technological institutions in the Soviet Union and an additional 
154 in the satellite countries, and has standing orders with 750 book 
dealers throughout the world to provide publications regardless of 
language or country of origin. 

Another source of scientific materials is found in the copyright 
function which the Library performs. Through the operation of the 
copyright law, the Library of Congress is assured that all American 
publications and many of the foreign ones in the scientific field that 
are registered for copyright find their way into the national collections. 

r. L. Quincy Mumford, Librarian of Congress, testified in May 
1959 before the Committee on Science and Astronautics of the House 
of Representatives, with respect to the science activities of the Library, 
as follows: 


One of the major services to the Nation which the Library 
of Congress nebeinene, then, is the acquisition of the ever- 
increasing worldwide output of scientific and technical 
materials. On this great reservoir, the Congress, the execu- 
tive departments and agencies, industry, the scientific com- 
munity, and the public draw for their information needs. 
That they are able to do so with facility and reasonable 
speed is due to the second great service performed by the 
Library—the cataloging ek. indexing of the materials re- 
ceived. This is a traditional library operation which is 
at once a highly technical and time-consuming process. For 
most materials, the Library of Congress printed catalog card 
serves very well, not merely for the Library’s own collections 
but for the collections of the major research libraries through- 
out the country, many of which buy or otherwise utilize the 
cards which we prepare and print. The broad validity of 
the Library of Congress system of subject classification and 
the indispensability of the printed card is evidenced by the 
fact that the service has nearly 10,000 subscribers. The 
Library also maintains the National Union Catalog, con- 
taining cards locating research materials in 700 cooperating 
libraries in North America; and it publishes this and other 
catalogs in book form. 


Referring to the system of retrieval used by the Library, Dr. 
Mumford testified: 


To make more quickly and completely available for use 
certain kinds of material, including acquisitions in science 
and technology, the Library publishes the following monthly 
lists: ‘“New Serial Titles,” listing the periodicals which the 
Library and some 200 other libraries receive; the “Monthly 
Index of Russian Accessions,’’ the ‘Kast European Accessions 





SS a a hh 


a 


=> 


Se ERE ee RRR 


RITE o. 


man 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Index,”’ and the ‘Southern Asia Accessions List.’’? The 
method is much the same for all of these publications. Spe- 
cial indexing is given to each article and, where necessary, 
the author’s name and the article title are translated. Arti- 
cles on science and technology are listed in a separate section 
with many subject subdivisions. An extensive subject index 
is added at the end of the volume. The volume of the 
‘“‘Monthly Index of Russian Accessions” for the year ending 
March 1958 covered periodical issues (not titles) totalin 
8,929, of which 5,202, or 58 percent, dealt with science and 
technology. We have already noted that the number of 
Soviet titles in science, technology, medicine, and agriculture 
was over 900, to be specific 905, or 66 percent of a total of 
1,368 titles received. During the same year, 5,632 mono- 
graphs classed in these fields were also received. In the 
volume of ‘the “East European Accessions Index” for the 
vear ending December 1958, the number of journal titles 
listed was 3,856, of which 1,497 were classed as science and 
technology. The number of issues in all fields was 28,067, 
while those in science and technology numbered 8,444. The 
number of monographs received in all subjects was 7,735, 
while those in science and technology amounted to 2,275. 
These specialized bibliographic productions make a for- 
midable contribution to the ational ecientific effort but they 
by no means exhaust the contribution of the Library of Con- 
gress in this area. Within the limits of budget, space, and 
its primary commitment of service to the Congress, the 
Library has assumed responsibility for administering a num- 
ber of large-scale projects which, on funds transferred pri- 
marily from the Department of Defense, provide analyses, 
abstracts, and other bibliographic services on certain types 
of literature, including scientific and technical material. 
Under this bibliographic program, the Library is making 
available to American scientific and industrial users, through 
the facilities of the Office of Technical Services of the De- 
artment of Commerce, a large volume of Soviet and other 
East European literature. In the past 12 months, 43,520 
abstracts of periodical articles have been prepared, including 
8,406 translations of foreign-language abstracts. Since 
August 1, 1958, a total of 29,947 abstracts has been made 
available to the Office of Technical Services. This abstract- 
ing service at the present time regularly covers 162 re 
journals, 141 of which originate in the Soviet Union. In 
addition to abstracts, 1,292 reviews in English of foreign- 
language monographs have also been prepared. Since the 
subject is often misunderstood, it should bo pointed out that, 
though a considerable amount of translation work is a neces- 
sary part of the abstracting service just described, the 
Library of Congress does not as a normal function prepare 
or distribute translations of foreign scientific and technical 
materials as such and at the present time has no plans to 


.do so. 


79 





80 DOCUMENTATION OF SCIENTIFIC INFORMATION 


Referring to the Library’s Science and Technology Division, Dr. 
Mumford advises: 


The third great area of service by the Library of Congress 
in the field of science and technology is its work in reference 
and bibliography. The focal point of subject competence 
in this field is the Library’s Science and Technology Division, 
which maintains, on a 7-days-a-week basis, a special science 
reading room. Established in 1949, the Division, with a 
relatively small staff of subject and language specialists, has 
an active bibliographic and reference program, and its serv- 
ices are steadily increasing. The total reference services 
provided to the Congress, to other Government agencies, to 
other offices of the Library, and to the public, numbered 
10,873 in fiscal 1958. From present trends, it appears that 
the total in the present fiscal year will reach approximately 
14,000 or an increase of about 30 percent. The reference 
services performed for the Congress in the form of special 
studies and other assistance numbered 143 in fiscal 1958. 
Present trends indicate that the number in the current fiscal 
year will reach 275, an increase of more than 90 percent over 
the preceding year. Nor is the mere number of these requests 
truly indicative of the increase either of the service to the 
Congress or the workload imposed on the staff. In 1] 
month—March 1959—the scope and length of the studies 
completed reached a point where they required the equivalent 
of 444 man-months of professional staff time. The range of 
these inquiries and their difficulty is shown by the following 
examples of the subject covered: “The Effects of Atomic 
Radiation of Human Beings,” “Jet and Rocket Fuels and 
Ozidizers’’; ‘‘Atomic-Powered Airplanes and Rockets’; 
“Underground Disposal of Radioactive Wastes’; ‘National 
Science Policy’’; ‘‘Alecohol-Blend Motor Fuels”; ‘“Cold- 
Weather Agriculture”; “The Future of Science’’; and ““Toxi- 
cology and Use of Thiocarbonyl Tetrachloride.” 


Dr. Mumford pointed out that the Science and Technology Division 
serves as a collecting and reference center for unclassified technical 
report literature of the United States and foreign countries. The out- 
put of a number of Government agencies and their contractors, in- 
cluding the Atomic Energy Commission, the National Aeronautics 
and Space Administration, and agencies within the Department of 
Defense, forms a good part of this literature. In addition to its other 
activities, the Division has a research program in documentation 
techniques, the objective of which is to make the Library’s science 
collections more responsive to current trends and needs. 

Dr. Mumford testified further that the Science and Technology 
Division has under preparation some 25 major surveys of the literature 
in the various fields of science and technology, stating that— 


Several are being done on contract for the Department of 
Defense; others are supported by the National Science 
Foundation. Although most of these bibliographies are pro- 
duced for and paid for by other agencies with a special need 
to know, those dealing with unclassified literature are in 
nearly all cases made available to the scientific community 





DOCUMENTATION OF SCIENTIFIC INFORMATION 81 


at large. The range of coverage of these bibliographies can 
be seen from their subjects, which include: the “Inter- 
national Geophysical Year,” ‘“Thermal Properties of Metals,’’ 
“Aviation Medicine,” ‘Industrial Application of Radioactive 
Isotopes,” and ‘‘Snow, Ice and Frozen Ground.”’ 


Dr. Mumford concluded that the survey which he had given to the 
House committee relating to the resources and services of the Library 
of Congress in the broad fields of science and technology, should serve 
to demonstrate that, by the breadth of its base, its wide acceptance 
as an authority in bibliographical know-how, the rich competence and 
variety of its staff, and its relationships with other institutions and 
organizations within and outside of the Government, the Library of 
Congress serves as a national library of science. 

In this connection, Dr. Mumford noted that, this role of the Library 
is supplemented and enhanced, rather than detracted from or weak- 
ened by the two other great national library collections, those in agri- 
culture and medicine held by the Library of the Department of Agri- 
culture and by the National Library of Medicine. 


Referring to the possibility of creating a national library of science, 
Dr. Mumford stated: 


It has sometimes been suggested, and indeed it has been 
proposed in legislation, that ~ transfers of functions from 
existing libraries and by the accumulation of separate collec- 
tions an independent national library of science and technol- 
ogy should be created, possibly within the framework of the 
proposed National Department of Science and Technology. 
In view of the existence of the three great collections men- 
tioned and their long record of service to the Nation and 
cooperation in undisturbed harmony with each other, I have 
one on record as opposing the creation of another national 
ibrary of science as not only duplicative and wasteful but 
impossible to bring into being without many years of con- 
centrated effort. 


Since its initial conference with Library of Congress officials, the 
staff of the committee has been advised by the Librarian that the 
Library has begun an intensive study of its present system and the 
desirability of improving its operations through mechanization so that 
more detailed scientific information could be provided to researchers 
and scientists. A group of three industrial organizations—Inter- 
national Business Machines, General Electric, and Ramo-Wool- 
dridge—had already conducted studies, and submitted recommenda- 
tions looking toward the mechanization of the present indexing and 
retrieval program with the objective of providing more useful services 
to the scientific community. The reports of these firms, as well as 
other sources of information, are now under study by Library officials. 

Discussions between the staff and officials of the Library of Congress 
indicated that those in charge of its programs have not yet found 
any proposal offering greater practical utility than its present system 
of storage and retrieval of scientific and technological materials. 
Until more promising systems have been devised than are presently 
available, major emphasis will be placed on collection or acquisition, 
and storage, and the Library will rely upon its printed catalog cards 
for indexing, abstracting, and retrieving. Library officials take the 








82 DOCUMENTATION OF SCIENTIFIC INFORMATION 


position that use of automatic data processing equipment will require 
much further study to determine its applicability to the Library’s 
information system, and that no machine system has as yet been 
devised which is superior to its present system. 

The staff of the committee has been advised by users of the Library’s. 
services, however, that while current manual procedures may be ade- 
quate for simple collection and storage, the Library’s abstracting and 
indexing systems are not completely adequate to meet the needs of the 
scientific community. For example, the Library serves as a reposi- 
tory for millions of scientific books, periodicals, and monographs. 
Scientists seeking detailed information are required to wade through 
enormous quantities of materials in order to determine whether the 
information they are seeking is, in fact, contained in any particular 
work. Mere subject headings, chapter headings, and brief topical 
sublines, do not meet their research requirements for speed and ac- 
curacy, or for specific scientific data not indicated by the indexing 
procedure followed by the Library. As a result, the present system 
causes considerable delay and waste of the valuable time of scientific 
researchers, which should be applied to the development of new 
scientific procedures. 

The Library of Congress, on the other hand, states that it has 
received nothing but commendation for the work of its Science and 
Technology Division. In addition to the unparalleled science col- 
lections which it has to exploit, this Division has a competent profes- 
sional staff of 42 to handle inquiries of a specialized nature. Of these 
42, 9 have earned doctor’s degrees, 12 have master’s degrees, and 6 
have bachelor’s degrees in science and engineering. Fifteen other 
members of the professional staff hold degrees ranging from bachelor’s 
to doctor’s in the fields of social science, library science, and languages. 
The Library believes that no other group engaged in information work 
in this area has comparable scope and breadth in science training. 
The volume of the Division’s work is indicative of its service to the 
scientific community, the Library feels; in the first 9 months of fiscal 
1960, the Division handled 11,904 requests for information, with 
inquiries being received from all the 50 States and from 37 foreign 
countries. 

The staff has found that officials of the National Library of 
Medicine, the Library of the Department of Agriculture, and 
private technical libraries such as the John Crerar Library of 
Chicago, strongly support the necessity for providing more than 
just indexing and the brief breakdowns now used by the Library 
of Congress. They are continually studying the development of new 
systems which would provide information in much more detail than 
is customary in normal library procedures. Their greatest handicap 
has been the lack of adequate funds to implement the programs they 
require. 

The Western Reserve University School of Library Science, in 
announcing a program of courses for training documentation special- 
ists, pointed up the need for modernizing library techniques, particu- 
larly as they relate to science and technology, stating that— 


* * * the quickening pace of research in recent years has 
resulted in a parallel expansion in the volume of publication. 
For most effective application of this newly acquired and re- 








renectms es 


TSTMS ota 


SN ERNE SP 


DOCUMENTATION OF SCIENTIFIC INFORMATION 83 


corded knowledge, it becomes necessary to extend and supple- 
ment traditional library methods. 

Thus the new field of documentation has been opened and 
expanded. Though closely related to librarianship, docu- 
mentation differs in a number of important respects. 

Librarianship is concerned with the universal task of 
channeling all kinds of graphic records to all users, for all 
purposes, at all levels, to the end that all recorded human ex- 
perience may be as socially useful as possible. 

Documentation, by contrast, is that aspect of librarianship 
concerned with improving graphic communication within and 
among groups of specialists; it involves that portion of 
librarianship which treats of the materials and needs of 
research and scholarship, and hence it is particularly con- 
cerned with abstracting, indexing, classification, searching 
operations, compilation of bibliographies, and similar means 
for meeting specialized information requirements. Though 
other phases of librarianship may draw upon the methods 
and experience of the documentalist, the latter is not con- 
cerned with the popular, the recreational, or the lay interests. 

The application of electronic machine searching for 
bibliographic reference work has been a particularly impor- 
tant recent development in documentation. New tools are 
thus being made available to enable the documentalist to 
meet the demanding requirements of the present and the 
future. 


In response to the request of the staff, the Librarian of Congress 
submitted a report entitled ““Mechanization of Services and Func- 
tions in the Library of Congress,” which is set forth below. In his 
letter to the staff director, dated March 28, 1960, transmitting this 
report, the Librarian stated: 


On January 15, 1960, Dr. Roy P. Basler, Director of our 
Reference Department and Chairman of the Library’s Com- 
mittee on Mechanized Information Retrieval, with two 
members of his Committee, met with you and members of 
your staff to discuss mechanization in the Library of Con- 
gress. Dr. Basler informed me of the tenor of your discus- 
sion and reported your continued interest in the Library’s 
services in the field of science and technology and your 
desire for a report on the Library’s activities with respect to 
mechanization. 

With respect to our science and technology services, there 
have been no major innovations in our activities since my 
statement to the Committee on Government Operations of 
the U.S. Senate, which is printed in part 1 of the hearings on 
the Science and Technology Act of 1958, pages 10-14, and 
my somewhat later statement before the House Committee 
on Space and Astronautics, which is printed in the hearings 
(No. 24) on the Dissemination of Scientific Information, 
pages 1-16. Those changes which we have noted concern 
increases in the number of services rendered, some new 
bibliographic and documentation projects, and intensifica- 
tion of acquisition of science publications. Effort is con- 





84 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


tinually exerted to increase the staff and subject competence 
of the Science and Technology Division, and in this we have 
had congressional support, as reflected in appropriations 
voted by the Congress for fiscal year 1959-60. 

I am enclosing herewith a report on ‘‘Mechanization of 
Services and Functions in the ieee of Congress”’: 


““Mechanization of services and functions 


“The Library of Congress has introduced mechanical 
equipment for the conduct of many of its operations as a 
normal aspect of administration and management planning. 
This report will assume that no detailed description is neces- 
sary of the use of such equipment insofar as it concerns normal 
administrative functions. The focus here is on Library oper- 
ations, in which mechanization is designed to provide a more 
economical, efficient, and effective operation. 

“Loan Division—Loan Records on Punched Cards.—The 
record of books lent to Government libaries is prepared on 
punched cards. The classification number of the books, the 
author’s last name, a sufficient part of the title to identify 
the item, the date and place of publication, the code designa- 
tion for the borrower, and the date of the loan are punched 
into a blank IBM charge form. The 026 printing punch 
enables the key-punch operator to verify the accuracy of the 
charge record without waiting to have it interpreted, and 
makes information immediately available concerning the 
material issued. At the close of the day, the charges are re- 
produced in four packs. The first, or discharge pack, is 
mechanically sorted in shelf-list (book number) order for the 
control file; the second pack by the borrowers’ codes and then 
alphabetically by author for the borrowers’ account file; the 
third pack is sorted in shelf-list order for filing in the Library’s 
central charge file, in which are recorded the locations of all 
books removed from the shelves for more than 24 hours; and 
the fourth pack is reserved for statistical analyses. The 
punched cards are interfiled manually in all the files, since 
only about 500 charges a day are involved and the total num- 
ber of current ones remains at approximately 10,000. Even 
though it would be possible to interfile mechanically, it would 
consume more time in actual practice and would be less 
econonical than the manual operation. 

‘‘When material is returned, it is first checked against the 
control file and the charge is withdrawn. The charge is 
then matched against those in the borrowers’ account file 
and the central charge file. This matching is performed 
manually. The charge withdrawn from the borrowers’ ac- 
count file is returned to the borrowing library. It serves as 
a receipt and affords the borrower a record of material lent, 
containing bibliographic information which may be used 
again if it becomes necessary to borrow the same book an- 
other time. 

‘At intervals of 4 to 6 weeks, lists of overdue material are 
sent to Government library borrowers. The lists are pre- 
pared on the tabulator from the punched cards. For each 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


account there are address cards, and the charges for overdue 
items are withdrawn manually. The machine listing is es- 
pecially effective and economical for long lists. This opera- 
tion, once tedious and time-consuming, is now almost entirely 
automatic. 

“The statistical file is analyzed as needed to provide infor- 
mation concerning the amount of materials in the various 
subject fields borrowed by Government libraries and to com- 
pile other data. 

“Serial Record Division—‘New Serial Titles’.—rhe Serial 
Record Division maintains a central visible card file record 
which shows the Library’s holdings of serial publications. 
A serial is a publication issued in successive parts, usually at 
regular intervals, and, as a rule, intended to be continued in- 
definitely. Serials include periodicals, annuals (reports, 
yearbooks, etc.) and memoirs, proeeedings, and transactions 
of societies. The central serial record now contains approxi- 
mately 300,000 title entries; it is undoubtedly the largest 
such record in existence. Efforts to find a way of mechaniz- 
ing the recording operation have so far not been successful. 

‘“‘An important corollary operation carried on by the Serial 
Record Division is the identification and recording of new 
serial titles. In 1951 the Library began publishing, at 
monthly intervals, Serial Titles Newly Received. In 1953 its 
title was changed to New Serial Titles. This has come to be 
extended not only to titles received by the Library of 
Congress but to those reported by some 300 other cooperatin 
libraries. Ever since its inception, the fundamenta 
teas of the publication has been the IBM punched 
card. 

“When the basic text, which in this case is composed of 
bibliographic entries for new serial publications, has been 
edited, the information is converted to punched cards. Code 
numbers for subjects (taken from the Dewey decimal classi- 
fication) and for country and language of publication are 
assigned to each title and punched into the cards, along with 
descriptive and other information. Whenever possible, 
information on source and price are included, and the 
libraries that have reported copies of each title are listed 
below the title, with an indication of the extent of the file in 
each library. The number of cards for an individual title may 
be anywhere from five to over a hundred. 

“Two important advantages of the punched-card method 
were foreseen when the publication began. First, it would be 
possible to print lists from the cards at wilt, without any 
further editing or proofreading, once the information was in 
punched-card form. Second, there was the sibility of 
mechanically preparing special lists of titles, selected on the 
basis of subject, country, or language. The first advantage 
has been realized, with the same punched card providing copy 
for successive monthly, annual, and longer cumulations. 
To a degree, coding by subject has also been mechanically 
exploited, inasmuch as cards from the monthly alphabetical 
issues are mechanically reproduced and then arranged 


85 








86 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


according to subject classification for the monthly issues of 
New Serial Titles—Classed Subject Arrangement, which 
began publication in 1956. 

“Unfortunately, the amount of time required to sort cards 
mechanically and the fact that many cards are required to 
store all of the distinguishing information needed for each 
title have made it impossible to run special listings with 
economy and ease. An alternative procedure, the creation 
and maintenance of additional files of cards, containing the 
same information as the master file of punched cards but 
arranged numerically according to subject or country num- 
bers, was tried but could not be carried out because of ma- 
chine and staff demands of other work in the Library. The 
secondary files were cumbersome to begin with, and all addi- 
tions to them required manual interfiling. The Library was 
advised not to attempt to assign alphanumeric characters 
to each title at the time the project was planned in consulta- 
tion with the punched-card experts; this was perhaps a mis- 
take, as all alphabetical interfiling has had to be done by hand. 

“Although the use of punched cards in the preparation of 
New Serial Titles has demonstrated the desirability of usin 
a less limited storage medium than the 80-column punche 
card when dealing with bibliographic entries for serial pub- 
lications, it has also demonstrated the ease with which 
printed copy can be prepared from the same punched cards. 
To present this publication, in its mechanized application, 
in @ proper perspective, it may be noted that for the past 5 
years it has provided a listing of approximately 16,500 new 
serial titles per year. At least a third of this number are 
serial publications in the fields of science, technology, agri- 
culture, and medicine. Not only does the publication reflect 
materials acquired by the Library, but since 1953 it has listed 
holdings of cooperating libraries as well, thus serving as a 
supplement to the Union List of Serials and providing a 
major resource for research. 

“National Union Catalog—Key to Symbols.—The National 
Union Catalog, established in the Library in 1926, serves as a 
means of locating books of research value in the collections of 
cooperating libraries. These libraries send copies of the 
catalog cards they prepare for their own use to the Library 
for filing in the NUC. Usually, when two or more libraries 
report the same book, one card is sufficient to represent it in 
the NUC; symbols are added at the bottom of the cards 
identifying the libraries which have the book. 

“The Union Catalog has grown steadily since its forma- 
tion more than 30 years ago. It now contains approxi- 
mately 14 million cards. The number of libraries reporting 
their holdings to the NUC has also grown over the years. 
The effective use of the NUC requires not only that a list of 
the symbols used in identifying the cooperating libraries be 
circulated but that this list be revised an Saou up to date 


in successive editions. The first list was issued in 1932; other 
editions were published in 1933, 1936, 1942, 1953, 1954, and 
1959. The cost of producing ever-larger lists and printing 





[Ok 5 RAL = RET RY roms 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


them for use by other libraries has led to the development of 
better and more economical methods. The first six editions 
were produced either through conventional printing or 
photo-offset methods. The seventh edition, on 134 pages, 
was produced by typing single lines on IBM cards, using an 
IBM electric executive typewriter, which permits very exact 
positioning of the type on the card required in the further 
steps. The cards were arranged alphabetically, first by sym- 
bol and later by the name of the library, and fed in two runs 
into a Recordak listomatic camera, which was made avail- 
able by the National Library of Medicine. This camera, 
exactly focused on the line, shoots about 300 cards (i.e. lines) 
per minute and produces the negative from which plates are 
prepared. The original cards have been retained and form 
the permanent copy for all future issues of the publication. 
Changes in symbols are accomplished by removing the old 


card and replacing it by a new one with cross-references as 


needed. Cards for libraries that are listed for the first time 


are typed and added to the file. 


“This procedure enables the Library to prepare copy of a 
revised list at a moment’s notice. Since the cards can be 
run in an arrangement by symbol or by name, typing time 
was cut in half for the production of copy for the first issue 
produced by this method, and will be saved again in all 


future issues. 





“Manuscript Division—Index to Presidential Papers.— 
Public Law 85-147, approved August 16, 1957, provided for 
microfilming and indexing the 23 collections of Presidential 
Papers (numbering approximately 2 million pieces) in the 


‘custody of the Library. The index for each collection is 
devised to give entries for all writers of letters other than the 


President himself, as well as entries for such items as diaries 
and wills. 

“The indexes are prepared with the use of an IBM 026 
printing punch, with accompanying rack, and an IBM 407 
printer. Index entries are punched on the cards, which are 
then proofread and delivered to the Library’s tabulating 
office. In that office, copy is reproduced within 2 days 
after delivery. The copy is prepared in two forms: a chrono- 
logical arrangement of the index entries, and an alphabetical 
arrangement by name of correspondent. After final proof- 
reading, the index copy is mounted on large sheets for photo- 
offset. reproduction. 

“The present method of preparing the index for publica- 
tion was selected after exploration of a number of possible 
methods. Although not as handsome as if it were printed 
by a letterpress, the final index will nevertheless be a very 
legible and attractive publication, accomplished at a prob- 
able saving of approximately 40 percent over any of the 
other methods investigated. 

“Order Division—Products of mechanization as aids to book 
selection.—The Order Division of the Library of Congress 
handles the purchasing of library materials. In the process, it 
is responsible for accounts originating in appropriations, in 


87 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


gift and trust funds, and in special accounts, which vary in 
their total from year to year but average approximately 
$500,000. The controls for obligating funds, for managing 
subscription and continuation ‘odors, for developing blanket 
orders with foreign book dealers, etc., are handled by means 
of IBM equipment. For management purposes, procedures 
on continuation orders are constantly reviewed, both for 
purposes of securing economies in fiscal operations and for 
msuring that the Library secures needed items from among 
the more abundant materials that may be available, but not 
all of which may be useful. The lists of continuation orders, 
which are printed from IBM punched cards, are reviewed 
periodically by recommending officers in the Reference De- 
partment to determine whether individual subscriptions 
should be canceled because of duplication of information, 
lesser substantive research value, etc. 

“Mechanization of otherwise routine fiscal operations in 
the Order Division has special significance to the reference 
services. By helping to simplify the great amount of detail 
work involved in acquisitions it has increased the speed and 
facility with which library materials may be obtained and 
ultimately made available to congressional and other users. 

“Science Division—Research and development in bibliography 
and information retrieval—The Science and Technology 
Division inaugurated a research program 2 years ago in 
documentation techniques. The objective of this program 
is to make the Library’s science collections and the Division’s 
reference and bibliographic products as responsive as possible 
to current trends “nt needs. Thisis the particular function of 
a science research specialist, whose principal duty is to study 
developments in documentation techniques for the purpose of 
evaluating their usefulness and applicability to the overall 
work of the Division. Based on this study, which includes 
familiarizing himself with a wide variety of mechanisms and 
equipment available on the commercial market or under 
development by industry and Government, the research 
specialist recommends techniques that appear worthy of 
trial or other application within the Division, and in selected 
ten actually conducts such trials on an experimental 

asis. 

“The first instance in the Division of the application of 
mechanized techniques to a bibliographic project was the 
indexing on punched cards, using the peek-a-boo methods, of 
an International Geophysical Year bibliography. The ap- 
proach employed consisted of devising an index which, by 
correlating unit-concepts, would provide assess to the nearly 
3,000 references collected for the bibliography. The peek-a- 
boo cards were used to implement the correlation of terms. 

“The second step considered desirable in further testing 
the applicability of mechanization to a Division problem was 
to provide some form of index that could be printed in book 
form by machine. The technique selected for trial was the 
Ledley ‘Tabledex’ scheme, which employs a computer. 
While the actual printing was to be accomplished by Dr. 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


R. S. Ledley, of George Washington University and the 
National Bureau of Standards, using the George Washington 
University computer, the work done in the Division involved 
very time-consuming selection of key-terms, or descriptors, 
and the establishment of the means by which these descrip- 
tors were to be combined in the actual index. To test the 
procedure, a sample “Tabledex’ for a portion of the IGY ma- 
terial was prepared by Dr. Ledley’s group. The Division’s 
work on the index is nearing completion for final ‘Tabledex’ 
printing by the George Washington University facility. 

“The third effort, which will probably be undertaken 
during 1960, is the application of the experience gained from 
the IGY bibliography to a larger project, namely the indexin 
of nearly 20,000 abstracts produced by the Division’s SIPR 
(Snow, Ice, and Permafrost Pench Establishment) bibli- 
ography project. This experiment will involve the use of 
the Divicon's recently acquired ‘Termatrex’ indexing equip- 
ment. This equipment provides a means for making holes 
in subject cards at precise positions which indicate the num- 
ber assigned to any document pertaining to a particular 
subject, the subject being represented by a descriptor card 
in a set of several hundred. The Division’s model of the 
‘Termatrex’ device has a capacity for indexing 40,000 docu- 
ments on one set of cards, each card being approximately 18 
inches square. The correlation technique is again the 
peek-a-boo method. 

“Another application of the system which is being con- 
sidered for initiation in 1960 is the control, for retrieval 
purposes, of the Division’s growing files of reference corre- 
spondence. This file embodies research that has been 
completed in response to inquiries in the field of science and 
technology. The work that was spent in preparing these 
replies can be taken advantage of when inquiries are repeated 
on the same subjects. In addition to the obvious economy 
of such indexing, there is the further benefit of obtaining 
experience with a much wider range of subjects than is 
represented by any of the specific bibliographic projects so far 
indexed. The “Termatrex’ device will be used to achieve 
subject control of the reference correspondence. 

“Tt is hoped that this index approach may be applied 
ultimately to the Division’s collection of technical report 
literature. This collection involves hundreds of thousands 
of technical research reports, and should provide a sub- 
stantial basis for the evaluation of the overall applicability of 
documentation techniques to library problems in science and 
technology. 

‘“‘Photoduplication Service-—The Photoduplication Service 
was established in the Library on a revolving-fund basis, 
through a grant from the Rockefeller Foundation in 1938, 
in order to provide means for researchers and others to secure 
photocopies of materials in the Library’s collections which 
might otherwise not be available, and to provide this service 
at an economical level. The Photoduplication Service has 
consistently attempted to improve its operations, increase 


89 











DOCUMENTATION OF SCIENTIFIC INFORMATION 


the effectiveness of its services, and maintain a low-price 
a taking advantage of technological developments in its: 
eld. 

“The most significant recent development has been the- 
adoption of the revolutionary Xerographie process, which 
has brought major changes to the operating procedures and 
quantity production of the Photoduplication Service Lab- 
oratory. 

“The introduction of this equipment (Haloid-Xerox Copy- 
flo No. 1) in early 1958, has resulted in a marked increase of 
orders for facsimile-type reproductions, has improved service, 
and has achieved lower prices. The effect on the Laboratory 
was the immediate displacement of five pieces of photostat 
equipment and, later on, the retirement of a total of six 
continuous-type printers, enlargers, and processors. 

“The photoduplicates which are now produced electro-- 
statically replace a substantial proportion of those formerly 
produced by slower manual operations on photostat equip- 
ment. 

“The Photoduplication Service also processes microfilm on 
automatic equipment, and, in general, uses machines wher- 
ever possible to supplement or supplant slower and more 
expensive manual operations. 

“During 1959 the Photoduplication Service produced 
1,317,915 electrostatic prints; 4,940,419 negative microfilm 
exposures; and 790,910 feet of positive microfilm. Of these 
reproductions, a substantial portion was made for industrial 
libraries, research laboratories, and Government agencies; the 
material, in the main, was scientific and technical in nature. 

“Mechanization of information retrieval in the Library of 
Congress.—The foregoing description of specific applications. 
of mechanical equipment to operations in the Library will 
serve to demonstrate the continuing interest on the part of 
the Library’s administration in developing the best methods. 
to aeons its work. This interest is obviously not a 
new one, but rather an expression of concern with efficiency 
and economy of operation within the framework of program 
objectives. 

“The more recent intensification of interest in the mechani- 
zation of information storage and retrieval on the part of 
the scientific fraternity (not at all limited, however, to the 
scientific community) found a corresponding quickening of 
interest on the part of the Library of Congress. The Librar- 
ian of Congress established a Committee on Mechanized 
Informational Retrieval on January 28, 1958. The mem-- 
bership of the Committee was drawn from the reference and 
processing departments and included chiefs of the several 
divisions most intimately concerned with the organization 
and use of the Library’s collections. 

“The meetings of the Committee during 1958 and 1959 
were designed to establish the framework for its future ac-- 


tivities. The problem, as it was recognized by the Com-- 
mittee, was one of coping with the Library’s broad and 
complex system for the storage and retrieval of information.. 





ee es 


DOCUMENTATION OF SCIENTIFIC INFORMATION 91 


The Committee was loath to become concerned with specific 
projects within this basic system, lest hasty adjustments to 
opportunistic mechanization compromise the Library’s 
operations, disturb cooperation among the parts of the sys- 
tem, preclude future compatability for expansion of mechan- 
ization after its ro commencement, and adversely affect 
its service potential to the Congress, the Government, and 
to other libraries which are particularly dependent upon 
traditional services provided by the Library. 

“The Committee’s first recommendations were designed 
to provide for more and basic study of the relationships 
within the Library’s information system, and the applica- 
bility of mechanical equipment to this system. At about 
the time these recommendations were being readied for sub- 
mission to the Librarian, the Library received overtures from 
several of the larger industrial concerns active in producing 
equipment and services in this field, asking to be permitted 
to undertake brief studies of the Library’s operations in order 
to provide the Library with recommendations and proposals 
as well as in order to secure experience in what is undoubtedly 
one of the largest information storehouses in the world. 
Three companies (International Business Machines Corp., 
General Electric Co., and Ramo-Wooldridge, Inc.) sent 
teams to the Library to pursue studies which ranged from 1 
to 2 weeks. The companies’ reports were studied by the 
Committee after they were received. Thereafter, the Com- 
mittee met to consider the implications of these reports and 
to recommend the future administrative posture to the 
Librarian. These recommendations have just been sub- 
mitted and have not yet been considered in the detail 
necessary to permit final action by the Librarian.” 


Problems of information retrieval in public libraries 


Following consultations with private industry relative to library 
programs, the staff determined that despite their tremendous resources, 
public libraries are neither equipped nor organized to serve as 
science storage and information centers. This is due, in con- 
siderable part, to the fact that libraries traditionally place major 
emphasis on storage, inventory, economy, and ineiowadl efficiency. 
Instead of being designed to meet and satisfy a user’s needs, they 
tend to adopt procedures designed to meet their own needs. Thus, 
the card index facilitates document retrieval by title and author, 
but does little to improve information retrieval no matter how 
elaborate the index may be. It has been stated that the current 
interest among researchers and others requiring scientific information 
is based, in part, upon the very fact that the library-type of index 
does not provide adequate access to needed stored information. In 
this connection, it has been pointed out that one of the basic problems 
lies in the difference in function between a public library and a tech- 
nical library. Whereas the public library has a long-term objective 
to satisfy cultural needs, the technical library’s objective is to dispense 
a more tangible product; answers to problems of interest on a day-to- 
day basis. Public-type libraries operating traditional library systems 
may legitimately aleeeify as unreasonable an urgent request for 





92 DOCUMENTATION OF SCIENTIFIC INFORMATION 


specific information in a conveniently wrapped package, and can 
save a considerable amount of money by refusing to service such a 
request. On the other hand, a technical library, which is able to 
respond rapidly to such a request, can save considerable amounts of 
money in researchers’ time and thereby make it reasonable in relation 
to the value of services rendered. Accepting this premise, it appears 
to follow that technical libraries should not be operated on the same 
principles as public libraries. 

A recent survey on the value and importance of information 
retrieval, conducted by the International Business Machines Corp., 
reveals also that much valuable time and large sums of money may 
be saved by the utilization of proper systems for automatic data 
processing and information retrieval. 

In this connection the IBM survey (which was, according to the 
NSF, conducted by Lockheed Aircraft Corp., within its own organi- 
zation) reveals that, once the need for information arises, the 
requestor’s productivity drops until the information is obtained. 
Some men working on a single job virtually stop altogether. 
Others, who are pursuing several tasks, proceed with another 

roblem, but such efficient utilization of time is uncommon. 

he typical researcher slows down, tries to find alternate ways 
of solving his problem, or seeks alternate sources of information, 
usually by the consultation process. According to the survey, 
efficiency during waiting time drops 25 percent and costs a private 
industrial organization approximately $2.50 an hour. Thus, a 1-hour 
delay can eat up more than 20 percent of the savings generated by a 
retrieval and 5 hours delay can produce a loss. If any sizable portion 
of the savings is spent on retrieval mechanization, losses develop 
from much shorter delays, and delays in excess of 1 hour appear to be 
economically unsound. 

Another important factor which is involved may be said to be 
psychological in nature. In this connection the survey reports: 


Users simply get impatient with mechanized systems that 
eecur longer delays and eventually refuse to use them. 

e have investigated numerous mechanized retrieval systems 
and have found that among those that failed, the most 
common cause was excessive delays. 


Concerning the importance of proper mechanization for information 
retrieval, the survey stated: 


It has been our experience in almost every field that intro- 
duction of a work-saving machine reduces cost and effort to 
the point where work previously considered economically in- 
feasible or inconvenient can be justified and that it thereby 
generates an increased demand for its service. 


The survey concluded that library retrievals can reduce the use of 
high-cost information sources only if the quality of retrievals is 
appreciably improved. False retrievals are a major problem in 
hand systems and more complex indexing of the kind that machines 
can handle should reduce their number. Eliminating false retrievals 
would salvage the delay time users waste in waiting for what they 
hope will be a valuable source of information. More important 
would be the increased confidence in the reliability of the library 
that would develop from a reduction in false retrievals. 





on 


DOCUMENTATION OF SCIENTIFIC INFORMATION 93 


In this connection the survey points out: 


Mechanization can presumably also reduce the waste that 
results from not finding information that actually is in the 
store. Several years ago, a persistent search of our Librar 
turned up 193 references to crack propagation in metals al- 
though at the time not a single card in our index provided a 
direct lead to any of these documents. Both the elimina- 
tion of false retrievals and the finding of obscure bits of 


information require highly sophisticated indexes which are 
best manipulated by machine. 


NATIONAL Scrence FounDATION 


The staff of the Office of Science Information Service, National 
Science Foundation, was requested by the staff, on January 6, 1960, 
to compile information required in connection with the current staff 
study of the Senate Committee on Government Operations on docu- 
mentation, indexing and retrieval of scientific information. At that 
time representatives of the Office of Science Information Service, who 
met with members of the committee staff, agreed to provide data on 
the following topics: 

1. Foundation grants or contracts in the general area of mech- 
anized handling of information. 

2. A summary of Foundation information programs, ee 
the role of mechanization and activities of other agencies, an 
highlighting some of the problems facing the improvement of in- 
formation activities. 

3. The Physical Sciences Information Exchange, with discus- 
sion of the duatanteriativn of this type of information activity as 
it relates to other kinds of information services. 

4. Examples of non-Government activities which are develop- 
ing or supporting the development of mechanized information 
systems. 

5. Highlights of progress on and problems involved in in- 
formation areas not associated with mechanization. 

6. Activity in the professional scientific societies aimed toward 
improved information systems in various fields of science. 

On March 17, 1960, Dr. Burton W. Adkinson, head of the Office of 
Science Information Service submitted to the staff, in response to this 
request, a report on the role of the National Science Foundation in 
which the above topics were incorporated into a comprehensive pres- 
entation of current developments in scientific information, on the 

remise that the separate topics could be more effectively considered 
in relation to the whole field of scientific information. 

Also, in response to a request of the staff for a review of the first 
proofs of the proposed report, the following general comments were 
submitted by Dr. Alan T. Waterman on May 10, 1960: 


The draft report on “Documentation, Indexing, and Re- 
trieval of Scientific Information” is a useful collection of 
information about activities of a number of organizations, 
and progress that is being made in devising new and better 
methods of handling scientific information. Your estimate 
54122607 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


of the problem is reflected in the paragraph on page 30 which 
states : 

“With certain exceptions there was also general agreement 
among those officials representing industry and government 
that the basic problem related, not only to the development 
of new automation and mechanized equipment; but also per- 
haps more precisely to develop more ective. means of de- 
scribing the information contained in a given doc and 
of describing the information desired by an inquirer—all so 
that these two deseri iptions may be capable of being matched 
and pertinent information obtained y the inquirer. This 
would require, among other things, formulating engineering 
systems and special ‘codes or thesauri for better and more 
rapid utilization of such equipment. A major problem is to 
develop the needs and requirements of any given operation, 
governmental or private, and the systems engineers and man- 
ufacturers of mechanized equipment will then be in a position 
to provide the necessary automation to achieve the objective 
sought, within the limits of coding and input abilities which 
are made available.” 

In our view, this is an excellent general statement of the 
problem of mechanizing information systems. The report 
discusses three distinct types of mechanization applicable to 
the handling of scientific information without, however, dis- 
tinguishing the differences that exist among them. For ex- 
ample, the first type might be termed ordinary data process- 
ing systems, such as those used by the Agricultural Research 
Service, Tre: asury Department, Department of Health, Edu- 

cation, and Welfare, Federal Communications Commission, 
Interstate Commerce, and the Railroad Retirement Board, as 
summarized in the report. These systems involve the handling 
of numerical data of variant kinds, names and characteristics 
of organizations, individuals, etc., and are straightforward 
data processing tasks that present no real pr oblem to persons 
skilled in data processing techniques. Fairly wide use is 
presently being made of this type of system. For example, 
the National Science Foundation employs mechanized data 
processing techniques in maintaining its national register of 
scientific and technical personnel, but it was not thought to be 
revelant to this report on documentation, indexing, and re- 
trieval of scientific information. 

The second distinguishable type of mechanization, that of 
the production of printed indexes and bibliographies, is dis- 
cussed in some detail in the material submitted by the Atomic 
Energy Commission, the National Library of Medicine, and 
the Library of Congress. Necessary equipment, techniques 
and knowledge are available and ready for application to this 
problem, and although considerable work has been involved 
in devising the efficient systems described, there is no real 
difficulty in devising wor kable systems within a fairly short 
period of time. To this point on page 45, AEC commented : 

“Neither magic techniques nor custom-built machines have 
been responsible for the NSA [Nuclear Science Abstracts] 








DOCUMENTATION OF SCIENTIFIC INFORMATION 95 


achievement. Success was due instead, to the practical ap- 
plication and combination of existing equipment and tech- 
niques. 

‘he third type of mechanization is much more difficult and 
is concerned with the processing of the subject content of 
scientific and technical documents for storage and automatic 
searching. In this area of mechanization we and other Fed- 
eral agencies support a good measure of research to determine 
how best to identify and to code the significant information im 
scientific documents and whether these steps, as well as 
searching procedures, can be mechanized in order to handle 
large collections of information effectively. The tasks being 
mechanized by the Armed Services Technical Information 
Agency exemplify all three types of mechanization discussed 
above. Of the four listed on page 34, the first three fall 
into the data-processing type; the fourth falls into the mecha- 
nized production of printed indexes. The third part of the 
second stage is clearly of the third type, i.e., mechanization 
of seare hing. 

One of the difficulties of mechanization of storage and 
: search operations not touched upon in your report is that of 
! the effect of volume of material to be handled on the process. 


eae ee ae TOS 


~ oem 


Procedures that work well for a few thousand or tens of 
thousands of documents probably will not adequately handle 
the volume of material that would have to be processed for 
& major scientific discipline, such as chemistry, for example. 
The Chemical Abstracts Service, to use an illustration, is now 
handling more than 150,000 research and technical papers and 
patents ‘annually. The Library of Congress handles many, 
many times that number of scientific publications, The 
relationship of volume to the complexity may be stated as 
the larger the volume of material that is to be handled by « 

retrieval system to serve scientists, the more detailed must be 
the analysis and indexing of the subject content of the docu- 
ments. This must be so in order to provide the highly selec- 
tive discrimination that will make it possible to avoid swamp- 
ing the inquirer with too many documents of tenuous rele- 
vance to his question. 


The report, as prepared by Dr. Adkinson and his staff, follows : 


CURRENT DEVELOPMENTS IN SCIENTIFIC INFORMATION 


The reason scientific information has become a major 
problem is that the spectacular growth in science and tech- 
nology, particularly since World War II, has multiplied 
the volume of scientific information to a point where it can 
no longer be published promptly and ade equately managed 
within the framework of existing methods and organiza- 
tions. The increased rate of scientific discoveries accompa- 
nied by their rapid application through technology has also 
created an element of greater urgency in disseminating re- 
search results among scientists and engineers. Finally, since 
much that is signific: int in science is now being published in 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


unfamiliar languages, the working scientist is faced with al- 
most insuperable problems in attempting to keep himself 
informed on what he needs to know. 

It is apparent upon close analysis that any approach to 
effective solutions of the scientific information problem di- 
vides itself into two principal aspects—first, improvement 
of present information services which use known and tested 
systems and techniques, and, second, development of new and 
more powerful techniques, including mechanized systems, 
for coping with the rapidly expanding body of scientific and 
technical literature. 

It is the purpose of this paper to report on the widespread 
and vigorous efforts already underway and planned to pro- 
vide significant improvements to both parts of the problem. 
With regard to the first aspect, progress is being made 
through effective coordination and cooperative efforts by 
Government agencies, professional societies, and private 
organizations to improve existing facilities and techniques in 
such a way as to measurably increase the present availability 
of and access to scientific information. In the second in- 
stance, long-term programs of research and development are 
being carried out both by Government and private groups 
looking toward new and improved techniques, includin 
mechanized systems, for disseminating, processing, an 
searching scientific information. The ultimate goal is of 
course to insure the ready availability to U.S. scientists and 
engineers of the world’s current and past output of signif- 
icant scientific information. 

Through legislative and Executive actions in 1958 and 
1959, the National Science Foundation was asked to under- 
take national leadership in efforts to improve the availabil- 
ity of scientific information to the entire U.S. scientific 
community. This has involved the review, coordination 
and stimulation of activities in all areas of scientific infor- 
mation, supplementing and assisting where necessary, and 
the development of solutions to problems through coopera- 
tion and coordination of the agencies and organizations con- 
cerned. The nature and scope of the Foundation’s efforts in 
this direction, the objectives and purposes of its support, and 
the kinds of activities supported are illustrated in this report. 


I 
FLOW OF INFORMATION 


Research produces a statement in words, numbers, graphs, 
or a combination of these, of the observed results of investi- 
gation, whether theoretical or experimental. A variety of 
motivations urges the scientist to add his results to the sum 
of human knowledge. In order for his additions to be valid 
and to satisfy his motivations, he feels bound to survey and 
utilize the record before making his contribution. This is the 
life cycle of information. 

No single description of the scientific communications net- 
work is completely satisfactory. Both the existing and the 





6 PTR EEE oe 


OSE TEN: 


BI a ER EE a 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


ideal information system may be described in administrative 
terms as consisting of generators, processors and users of in- 
formation, or the system may be described in terms of the in- 
formation itself according to the functions information per- 
forms, for example, to provide current awareness or a 
retrospective view of a topic or field. 

The attached charts (figs. 1 and 2) provide a generalized 
description of the flow of information from the generator to 
the user. 

Figure 1 


How The Scientific information Process Functions 


SOURCES 


reports - papers - 
theses 


CONVENTIONAL BIBLIOGRAPHIC 
PUBLISHERS 


SERVICES 
journals - books abstracts - indexes 






















current research and other “unpublished” information 


depositories - data 
centers- reference 
services 





97 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


C 
o> 





SULINIGNIS 
_ SESTINGI29 
— -HWVISTY 

ONIWASNOD 


— 


ae eceeecese 


TWNYNOP 
RIVA 


=~ 
oa ae 


« 
¢ 


ie. 

° 
, 3 
qercced"0cte es slo es secads eeces Oter gett tOOn Ceevetesorererias 


deveee S}BUUOYD Asopudreg §=. OM] DJ 
= sjauuvgy 4ofow ; UC HY 
NOILVWYOANI 40 MOT4 


GZ aUooriy 





, 
ee ee vesecscoa® 


L¥0d3¥ 











HHL 
HUTT 


t 
| 


“RHOMaeR 


Ii HII I 


“ananao| 


j 
| 
| 
i} 











DOCUMENTATION OF SCIENTIFIC INFORMATION 


Figure 1, drawn from the point of view of the user of in- 
formation, shows the basic relationships between major com- 
ponents of virtually all comprehensive information systems, 
whether within a discipline of science, an industry, a large 
technically oriented company, or a Government agency such 
as the Atomic Energy Commission. There is first a source of 
information. In the center are all users of information. 
There are primary journals which publish for the first time 
most of the information generated by research. The second- 
ary services gather, condense, and reorganize papers and 
other collective units of information in the form of abstracts 
and indexes, bibliographies, handbooks, and similar reference 
tools. At the bottom of the chart are the data and reference 
services which may publish highly specialized collections of 
information in particular ways. These services may also 
answer specific technical questions addressed to them. Figure 
2 traces the typical life history of a research finding, including 
some of the complexities which have been created in response 
to needs that the basic system cannot meet. The boxes behave 
as virtually independent variables that help as well as 
hinder the flow of information. 

In terms of the scientific community outside the labora- 
tory of an investigator, the results of research first take one 
or more of three forms (fig. 2). A report may be written 
and distributed outside the laboratory in several hundred 
copies. An oral paper may be delivered to a gathering of 
scientists. The most conventional and easiest to control form 
is the technical paper written for publication throughout the 
world today. x conservatively estimated 1,250,000 original 
papers will be published in these journals in 1960. 

The paper published in a primary journal is automatically 
accessible to whatever group of scientists include that journal 
in their regular reading or scanning matter, either person- 
ally or through library subscription. The co is also au- 
tomatically exposed to a group of scientific abstracting and 
indexing services, estimated to number as high as 3,500 
worldwide. There are 14 so-called major abstracting and 
indexing services in the United States which will print about 
600,000 abstracts and title listings in 1960. 

The potential audience of the paper is increased signifi- 
cantly by the number of scientists aware of and accustomed 
to use the abstracting service which reports it. A chemical 
engineer who may regularly read only one primary chemical 
journal may more regularly scan parts of Chemical Abstracts 
and thus expose himself to more than 100,000 papers gleaned 
from about 9,000 journals. 

The next step leas the size of the audience exposed to 
the paper but increases the forcefulness of exposure. It is a 
step of growing importance. Data and reference centers 
comb the primary journals and secondary services, as well as 
other, less conventional sources, to compile only those parts 
of the available information which are pertinent to the spe- 
cific topic of concern to the center and to the individual 


99 





100 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


scientist, for example, cobalt, the physical properties of hy- 
drocarbons, or the Arctic. 

Because the figure 2 system is the product of an historical 
development upon which the international communication of 
scientific information still fundamentally rests, it must be 
dealt with seriously and improved wherever possible. The 
status of the system will be discussed in more detail later in 
this paper. 

Thus far, the traditional system through which the largest 
volume of research results is disseminated has been outlined. 
The balance of this review will treat each of the blocks of 
figure 2, grouped in a manner which emphasizes the prin- 
cipal interrelationships which exist. 


It 
RESEARCH ON SCIENTIFIC INFORMATION PROBLEMS 


Rapid and effective large-scale systems for organizing and 
searching the information content of scientific publications 
are essential if scientists are to be able to locate needed infor- 
mation without spending a large proportion of their valuable 
time on laborious searching through the literature. These 


systems should incorporate mechanized procedures wherever 
mechanization promises to increase their speed or the accu- 
racy and detail with which information can be retrieved. 
The design of these improved systems and the development 
of procedures for using high-speed machines in the process- 


ing of scientific information require thorough study of the 
actual information requirements and practices of scientists, 
research on and experimentation with possible ways of organ- 
izing and searching information so as to best meet their re- 
quirements, and finally, testing and evaluation of proposed 
new procedures and systems. In addition, research on ways 
of using electronic machines to translate automatically from 
one natural language into another is important, not only be- 
cause of the acute need for more translations of foreign 
scientific publications, but also because it is believed thet the 
solution of problems encountered in attempting to write ma- 
chine programs for the accurate analysis and translation of 
natural languages is likely to contribute also to the develop- 
ment of mechanized systems for processing and searching 
scientific information. The raw material in both cases is 
natural language. 


Information requirements of scientists 

Although there is a good deal of well-informed subjective 
appraisal of the information requirements of scientists, there 
is a pressing need to obtain precise, objective knowledge of 
their requirements in order to arrive at general agreement on 
measures that might be expected to bring about the greatest 
improvements in the dissemination, organization, and use of 
scientific information. For the present, it is necessary to 
plan and act on the basis of subjective analysis of require- 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


ments and of the adequacy with which they are being met, 
but concurrently there must be developed through well- 
planned studies and experiments a deeper understanding of 
the actual role of information in research and other scientific 
activities, of the adequacies and inadequacies of present com- 
munication patterns and information services, and of the 
relative importance of the varied information requirements 
of scientists that are not being met satisfactorily. 

From studies of the information requirements of scientists, 
the Foundation hopes to gain a much better understanding of 
the ways in which scientists now communicate among them- 
selves, haw they make use of the scientific literature and avail- 
able reference services, and the extent to which existing pub- 
lications and services meet the actual needs of scientists. The 
Foundation also hopes to obtain more precise knowledge of 
the functions of information itself and of various types of 
information services in the research process. Such knowledge 
will help to determine the best overall pattern of information 
services for science, and will also help in the design of search- 
ing systems that will best meet the users’ requirements. 

The Foundation has supported certain fundamental studies 
in this area. These have been pilot studies designed primarily 
to test different methods of gathering data on communication 
among scientists and on the use of the literature, because of 
the most difficult aspects of information requirements research 
is that of obtaining objective data which accurately describe 
the topics being studied. 

One such study, at the Case Institute of Technology, made 
use of operations research techniques to determine as objec- 
tively as possible how a carefully chosen sample of chemists 
spend their time. Perhaps the most striking finding of this 
study was that industrial chemists spend 32 percent of their 
working day in scientific communication, both oral and writ- 
ten. When their after-hour activities are included, the time 
spent on scientific communication is even more striking, 
amounting to about 51 percent of the total time spent on 
scientific activities. These figures present convincing evi- 
dence of the value of efforts to make scientific communication 
more effective and efficient. The work at Case is now con- 
tinuing with the gathering of more extensive data on the 
actual publications used by chemists and physicists. 

The data-gathering phase of these studies will need to go 
on for some time, but experimental courses of action that hold 
promise of improving dissemination and effective use of 
scientific information will be tried concurrently. The prin- 
cipal purposes of gathering data on how scientists use present 
information services are to gain insight into the needs that are 
not now being met, and to provide a basis of comparison to 
show to what extent the situation has been improved by 
introduction of new or modified services. 

The Bureau of Applied Social Research at Columbia Uni- 
versity has just prepared for the National Science Foundation 
a critical review of all studies to date of the flow of informa- 


101 





102 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


tion among scientists. The review discusses the various 
methods that have been used in the studies and compares the 
findings, insofar as they are comparable. It summarizes what 
has been learned thus far, and suggests other approaches that. 
might be tried in future studies. It will be useful in planning 
further research on scientists’ information requirements. 
Studies by scientists of their information systems 

The overall communications systems now in use are largely 
the product of development by scientists themselves. Since 
it 1S scientists who generate information which they wish 
other scientists to learn about, and, inversely, it is scientists 
who are most concerned with the work of other scientists, 
it is natural that individual scientists and organizations of 
scientists in professional societies and academies have created 
and developed their own information transmitting mecha- 
nisms. What has been lacking, however, is a major, continu- 
ing effort to consider the total system which these mecha- 
nisms constitute. The Foundation has been encouraging 
scientific organizations representing broad fields of science 
to review and analyze the information systems within their 
fields in order to propose means for strengthening or modi- 
fying them where necessary. 

A physies study, by the American Institute of Physics, has 
been underway for about a year. In biology the American 
Institute of Biological Sciences has established an ad hoc 
group which has held two preliminary meetings to define 
the problem in this field and to plan a definite procedure 
for a large-scale investigation which should get underway 
shortly. Common to both studies has been the recognition 
by leading scientists in the fields involved (a) that the infor- 
mation problem is very complex; () that simple state- 
ments like “What we need is more abstracts” or “What we 
must do is mechanize” or “Scientists should write less,” if 
taken in isolation, ignore a whole host of variables and vastly 
oversimplify the problem; and (¢c) that any long-range or 
lasting solution must consider all phases of information re- 
cording, dissemination, and control. 

1. The American Institute of Physics (AIP) Study.— 
The initial NSF grant in 1958 permitted the AIP to estab- 
lish a documentation research department to study the prob- 
lems of disseminating physics information and to devise and 
recommend improved methods where these are needed. The 
department is to be a permanent part of the institute with 
NSF support tapering off as society funding is built up. 
Since the institute long has conducted a major publishing 
program, involving the dozen or so principal U.S. physics 
journals, it is natural that the study has begun with investi- 
gation of various aspects of publication in the broadest sense 
of the word. Among the specific areas currently being stud- 
ied are physics abstracting, rapid indexing of physics jour- 
nals, publication costs, new developments in printing tech- 
niques, specialized versus general journals, information 
needs and information-gathering practices of different 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


groups of physicists, photocopying and copyright problems, 
magnitude of the world’s physics literature, preparation of 
critical reviews, and the like. 

2. The American Institute of Biological Sciences (AIBS) 
study.—The AIBS study, which will consider at length the 
problems outlined by two preliminary meetings of an ad hoe 
group convened for the purpose, differs from the AIP pro- 
gram in that as now planned, it is to be a terminating project 
rather than a permanent activity. It is expected to get un- 
derway about July 1, 1960. Problems to be investigated for 
the biological field parallel to some extent those under study 
by the physicists and include, in terms of AIBS’ preliminary 
proposal to NSF, biologists’ needs for and uses of scientific 
information, methods for improving the effectiveness of dis- 
tribution of biological information, characteristics, and im- 
provement of the information activities of existing agencies, 
and the like. 

In both of these studies the possibilities of mechanization 
are being, and will be, kept in mind and considered in all 
aspects where it has potential for present application or seems 
to offer promise for the future. 

Nonmechanized information processing 

1. Evaluation of present methods—There is further need 
for evaluation and comparison of present methods for organ- 
izing and searching the subject content of scientific publica- 
tions and reports. One example of such work is the care- 
fully scintelod comparative study of two different indexing 
systems and two different types of classification systems that 
is being conducted by the British Association of Special Li- 
braries and Information Bureaus, with funds supplied by 
the National Science Foundation. Such studies help to de- 
termine the actual effectiveness of different methods of sub- 
ject analysis in locating information that is sought. 

Much work needs to be done to compare methods now in 
use with newer, partially mechanized methods. Directors of 
information services have in general been reluctant to adopt 
mechanized techniques until their superiority over traditional 
methods has been demonstrated. An example of this attitude 
outside the Government is found in a recent discussion of 
improved technical information services (in Special Li- 
braries, November 1959, p. 445) by Mr. Harry B. Goodwin 
of the Battelle Memorial Institute. Mr, Goodwin said that 
the reason machine methods have not been used in the De- 
fense Metals Information Center there “is simply that we 
don’t feel that any existing machine offers any advantage 
over the special multiple-entry filing system developed at 
Battelle and now being used.” 

One aspect of the test program at Western Reserve Uni- 
versity being supported by the Foundation will include test 
searches and related studies designed to compare the relative 
efficiency of manual and mechanized searching methods. 

2. Systematic organization of knowledge.——There have 
been relatively few studies of principles of organizing or 


103 





104 


- Area of Mechanized Handling of Information Su 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


classifying knowledge systematically and few attempts to 
devise better systematic organization irrespective of the tech- 
niques (manual or mechanized) subsequently used in process- 
ing information. In a recent published statement (The 
American Psychologist, June 1959, p. 270), the American 
Psychological Association referred to this blem as the 
“problem of encyclopedic organization of knowledge and 
codification of methods, measures, and results,” and recom- 
mended that psychologists attend to it immediately. 

As an example of activity in systematic organization in a 
particular field, Herner & Co., of Washington, D.C., with the 
aid of a grant from the Foundation have analyzed reference 
questions in the field of atomic energy and have designed a 
classification system for the field that is adaptable to mecha- 
nized searching devices. The new system will be tested in 
comparison with the subject heading system now in use in 
the Atomic Energy Commission libraries. 

Even though it is expected that machines will eventually 
be of widespread assistance in information handling, there 
is good reason to work toward better information organiz- 
ing methods that can be used in manual systems. Manual 
techniques will undoubtedly continue to serve a useful func- 
tion in particular situations perhaps indefinitely, even when 
mechanized searching systems have been developed and are 
in operation in some reference centers. 

It should be kept in mind, of course, that some of the 
present work on the mechanization of information process- 
ing, such as the current efforts to develop mechanizable 
techniques for controlling relationships among indexing 
terms, may also suggest more effective ways of organizing 
knowledge for manual systems. 

Certain other studies, such as those of the utility of various 
oe notation systems for chemical compounds, may pro- 

uce results applicable to either manual or mechanized sys- 
tems, or both; it remains to be determined whether different 
notation systems will be needed for manual uses and for 
mechanized systems, or whether a single system can serve 
both purposes. 


Mechanization of information processing 

The great majority of current research and experimenta- 
tion in the field of scientific information is directed toward 
the mechanization of procedures for information processing. 
Appendix A contains a checklist of “Projects in the General 


ported by 
the National Science Foundation.” Examples of activities 
of other Government agencies and of private groups per- 
taining to mechanized handling of information are described 
in seer Band C, 

1, Eeperience with Operating Systems.—Thus far a large 
majority of the operating systems that make use of some 
mechanized procedures are located within individual indus- 
trial organizations. The subject matter encompassed by any 
one system is fairly homogeneous; and for the most part 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


it is chemical or biochemical in nature, fields in which the 
information itself has a more apparent structure and is more 
easily coded for mechanized systems than is information in 
other fields. Very few of the operating systems cover more 
than 25,000 documents, and only two or three cover more 
than 100,000 documents—all very small systems compared 
with the problem of handling all scientific literature, or even 
all the literature of one discipline. Experience with such 
systems has been valuable and encouraging, but totally differ- 
ent procedures may be needed to handle very large volumes of 
material. 

2. Mechanization of the More Routine Storage and Re- 
trieval Processes.—It is important to realize that in all of 
the operating systems to date and in some of the documenta- 
tion research activities mechanization is being spies only 
to routine steps that follow the subject analysis of documents. 
Human judgment must still be employed to produce some 
sort of representation of the information selected to indicate 
the content of each document. These representations of con- 
tent may take many forms, ranging from lists of significant 
individual words to complex index entries that indicate rela- 
tionships among concepts. Procedures have been developed 
and are in use in experimental and operating systems for 
automatically converting these humanly produced representa- 
tions of the original documents into coded forms more suitable 
for mechanized processing; for the automatic manipulation 
of the coded representations to identify documents dealin 
with certain subjects or meeting certain specifications; ee 
also for the automatic reproduction of bibliographic descrip- 
tions and abstracts of the selected documents, and sometimes 
of the full documents themselves. Advances to date in the 
mechanization of storage and retrieval processes have been 
primarily in these areas. 

An example of a continuing research program in this area is 
the experimental literature searching center for metallurgists 
at Western Reserve University. This program, supported 
jointly by the American Society for Metals and the National 
Science Foundation, consists of the experimental use and 
evaluation of procedures developed at Western Reserve for 
the sadlicistaod encoding and searching of index data. It is 
expected that within the next year 40,000 documents of in- 
terest to metallurgists will be handled in this fashion. The 
procedures will be evaluated by studies of their effectiveness 
in identifying documents relevant to requests for information 
and also by the extent of metallurgists’ use of the service. 
The National Academy of Sciences is naming an ad hoc 
committee of metallurgists and information specialists to 
evaluate the results of the test program. 

3. Mechanization of Document Analysis.—A. great deal of 
current documentation research is devoted to finding ways 
in which machines can replace humans in the initial process- 
ing of documents. For automatic indexing, statistical meth- 
ods, based on frequency of use of words in documents are 


105 





106 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


being investigated. Similar procedures are also being tested 
in the production of “automatic abstracts.” The effective. 
ness of such procedures in selecting the significant portions of 
scientific papers has yet to be demonstrated. Other research 
is aimed at converting ordinary English into a more “normal- 
ized” language (with simplified grammar and with synonyms 
donbrilied in some fashion), suitable for use within a mecha- 
nized system. Still other research is concerned with mecha- 
nized procedures for the synthetic analysis of sentences of 
natural languages; such procedures might well constitute the 
first step of an automatic process for analyzing, abstracting 
and indexing the full texts of documents and also of requests 
for information. It appears that, in time, mechanized syn- 
tactic analysis will almost certainly be practical, at least for 
English and Russian. Further research is needed to deter- 
mine whether automatic indexing and abstracting procedures 
and normalized languages will be of practical use in storage 
and retrieval systems. 

Long-range research projects sponsored by the Foundation 
in this area are concerned primarily with studies of language 
and are exploring the possibilities of systematizing and 
mechanizing the analysis and processing of the actual tests of 
documents. 

Research at the University of Pennsylvania on the possible 
application of certain techniques of linguistic adatoets to 
scientific texts is directed toward the development of computer 
programs to conduct phrase-structure analysis of English 
sentences; to reduce the Srageforion tioned’ constructions” 
occurring in the text to much simpler, more uniform “kernel” 
constructions, with minimum loss of information; and to 
identify the significant words or phrases for indexing and 
abstracting purposes. A computer program for the first stage 
of phrase-structure analysis has been tested on a UNIVAC 
computer. It is believed that the program is adequate to 
handle the great majority of the sentence constructions that 
occur in scientific English. 

A research group at the Itek Corp. is endeavoring to sys- 
tematize the operations of information searching systems in 
order that they can be mechanized at least partially; such 
operations include the selection of significant information 
items in a scientific text for indexing purposes, the conversion 
of the selected index data into a normalized language, the in- 
terpretation of search questions and their conversion ito the 
same normalized language, and the planning and program- 
ming of searchers. The project includes further develop- 
ment of a method of representing natural language expres- 
sions in normalized form; and the development and testing of 
essential tools, such as a dictionary of terms and of operating 
rules governing the conversion of natural language expres- 
sions into their normalized equivalents, and a thesaurus-type 
device that records and displays the complex semantic rela- 
tions among words and expressions. The results obtained 
with these systematized procedures will be tested against the 
highest quality conventional indexes. 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


The Chemical Abstracts Service, as part of a broad pro- 
gram to develop ways of mechanizing some of its operations, 
1s conducting a study of the semantics of chemistry in order to 
learn how to handle semantic relationships systematically. 


Mechanized procedures for the synthesis of information 


It also appears that mechanized procedures could be de- 
vised to help in finding new relationships not apparent from 
any single document. For example, machines might be made 
to integrate data from different documents or sources, or to 
group items of information in new ways not conceived by the 
original authors or processors of the documents. Little work 
is being done in this area at this time, but it can be expected 
to increase as time goes on. Some work being done in other 
fields of science may later be applicable to this problem. 
Digital computers are being used to simulate synthesis 
processes in mathematics and logic and are also being pro- 
gramed to improve their own performance in proving 
theorems and playing games by ‘ “earning” from their own 
experience. Synthesis processes, however, must be better 
understood before the potentialities of their mechanization 
can be realized. 


Mechanical translation 


Closely related to research on the automatic processing of 
documents for input into a storage and retrieval system is the 
research being carried out in an effort to mechanize the 
translation of texts from one language into another. It is 
not yet possible to produce by Scneieeaiel ‘al means translations 
which are at all comparable to translations produced by com- 
petent human translators. Research to date indicates that 
machines will ultimately be able to produce useful transila- 
tions; the big stumbling block at. present is that machines, 
although they can be made to substitute words in one lan- 
guage ‘for words in another language, often cannot select. the 
proper English equivalent for each word in its particular con- 
text and cannot arrange the selected words into grammatical 
sentences. The result is that present machine “translations” 
consist largely of ungrammatical sequences of words rather 
than of coherent sentences, and their interpretation is there- 
fore usually a matter of guesswork. Those willing to accept 
ungr ammatical sequences of words containing some sugars 
tion of the content of the original foreign language text, 
lieu of actual translation, may find existing procedures Yor 
mechanical translation from Russian to English of some use. 

Mechanical translation research is a comparatively young 
branch of scientific research. Ten years ago, the idea of 
translating by means of a computer was still in the discussion 
stage. Today, there are 11 well-established mechanical 
translation research groups in this country. Of these, seven 
are located within universities, which serves to indicate the 
academic nature of much of this research. The researchers 
themselves have come to appreciate the magnitude and long- 
term nature of the problem, and the great majority expect 


107 












DOCUMENTATION OF SCIENTIFIC INFORMATION 


that many more years of research will be needed before any 
really satisfactory results can be achieved. 

The Foundation is supporting four projects in the field of 
mechanical translation. NSF interest in this area of research 
has two explanations. First, a need is seen for more trans- 
lations of scientific publications from some languages than 
can now be produced with the available skilled human trans- 
lators. Secondly, and perhaps more importantly, these proj- 
ects, which are concerned with language structure and prob- 
lems of multiple meaning, will produce results that will be 
applicable in part to the processing of information for storage 
and retrieval systems. In both cases the raw material to be 
handled is expressed in natural languages. 

The Foundation-sponsored research groups at Harvard and 
the University of California are endeavoring to develop pro- 
cedures that within a few years are expected to produce trans- 
lations automatically from Russian to English that will be 
useful for some purposes, although perhaps crude in certain 
respects. Work at Massachusetts Institute of Technology is 
aimed at producing the much more detailed knowledge of the 
way in which the German and English languages function 
that is believed to be necessary if we are ever to achieve high- 
quality translation by machine. The Cambridge Language 
Research Unit in England is studying languages primarily 
from the point of view of their semantic organization, and is 
exploring the possibility of designing a special thesaurus that 
will resolve meaning problems in a mechanical translation 
process. 

There is much interest among the general public in the 
possibilities of mechanical translation, but there also seems 
to be a great deal of misunderstanding as to the present status 
of work in the field. The Foundation regards the present 
work as still very much in the research stage, with many prob- 
lems still to be solved before it is possible to get useful trans- 
lations from machines. Present experimental output is both 
crude and incorrect in many respects and should not be 
construed as “translation” in the usual sense. It would have 
to be thoroughly checked and revised by human translators 
competent in their subject fields who know both of the lan- 
guages involved, before it could be used without danger of 
misunderstanding. Such revision of the machine output 
might require more time and effort than preparation of the 
translations in the usual way. Study and revision of experi- 
mental output, however, will identify the unsolved problems 
and will be of assistance in further research on mechanical 
sranslation. For this reason, some researchers find it useful 
to put sizable quantities of text through machines, using the 
translation rules that have been developed ; but it is premature 
to expect useful machine output in the very near future. 


Special-purpose information-processing equipment 


Computer technology probably is sufficiently advanced tc 
permit the development and construction of special-purpose 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


equipment to perform any specific well-defined information- 
processing task. Experience has indicated that, generally 
speaking, complete systems and the procedures that comprise 
them should carefully thought out and tested before 
special-purpose equipment is designed and built. The large 
and fast general-purpose computers are so expensive that 
their cost cannot as a rule be justified solely for information 
storage and retrieval or mechanical translation. They are 
quite versatile, however, and can be used to simulate proposed 
new: information-processing systems and their component 
procedures and devices. The general-purpose machines are 
therefore excellent tools for the research on new systems that 
should precede the drawing up of plans for new special- 
purpose equipment. 

The feasibility of mechanized systems to handle the texts 
of documents on a large scale, for either translation or in- 
formation searching, will depend on the availability of versa- 
tile print-readin diteees Preneat-diay readers can handle 
only one style of typewritten or printed characters or ma- 
terials that have been especially prepared with magnetic ink. 
Readers that will handle a variety of printed materials have 
we to be developed, but there is reason to hope that they will 

available by the time they are needed for processing large 
volumes of material in operating systems, provided sufficient 
funds are put into their evden in the immediate future. 
If they could be developed quite soon, they would greatly 
facilitate areas of research in which large quantities of 
linguistic data must be processed for study. 


Needed research and problems 


In all of the documentation research areas that have been 
discussed more good work is needed. The field of informa- 
tion research is still very much in its infancy, and there is 
as yet no widespread agreement on the major problems or 
their relative importance. Perhaps there can be no such 
agreement and no precise definitions of the research prob- 
lems until there is a much deeper understanding of the 
strengths and weaknesses of the whole complex of scientific 
information services and practices and of the true require- 
eee of scientists, which should determine our research 

als. 

Tae major difficulty is the insufficient number of 
competent investigators interested in information problems 
and willing to plan and undertake the research that needs 
to be done. Much of the needed research must be inter- 
disciplinary in nature and must draw on the combined talents 
of information specialists, linguists, logicians, mathemati- 
cians, philosophers, computer engineers, statisticians, opera- 
tions research specialists, psychologists, behavioral scientists, 
and specialists in various other scientific disciplines. Fur- 
thermore, experience has suggested that the best qualified and 
most productive researchers in the field of information proc- 
essing may be those thoroughly trained in two or more of 


54122—60——_8 


109 








110 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


the fields listed above, for example linguistics and logic, or 
mathematics and linguistics and computer engineering. The 
number of persons with such multighe training is increasing 
but is still relatively small. 

These two difficulties, lack of definition of and agreement 
on the major research problems and shortage of competent 
specialists interested in the problems, go hand in hand. The 
specialists are needed to help define the problems as well as 
to solve them; and yet many specialists who could contribute 
significantly have probably not become interested because the 
problems have not been clearly defined. In spite of, or per- 
haps because of, these difficulties, every effort must be made 
to stimulate and support good research work in the various 
areas. Even though research problems have not been ade- 
quately defined, it is possible to name some areas where more 
work is clearly ‘needed : 

1. /nformation requirements of scientists.—The identifica- 
tion and assessment of information requirements is an ex- 
tremely difficult area of study because requirements vary 
tremendously from one organization to another, from one 
scientist to another, and even from one period of time to an- 
other for the same scientist. The most pressing need there- 
fore is for the development of reliable methods for studying 
and assessing requirements, for determining the role of in- 
formation and information services in science, and for meas- 
uring the value of information and the utility and effective- 
ness of present and proposed services. Moreover, if at all 
possible, these methods of study should be so designed as to 
leave the scientists themselves and their work undistributed. 
The scientific societies interested in planning and undertak- 
ing studies and experiments in this area should be given every 
encouragement and support, and attempts should be made to 
stimulate scientists in all fields to help define their own needs 
in precise terms. 

2. Improvements in nonmechanized information process- 
ing —T he first need is for more carefully controlled studies 
to evaluate and compare present methods of indexing, or- 
ganizing and searching information. Such comparative 
studies should then be extended to include new and proposed 
methods and also to assess the relative merits of manual and 
mechanical techniques, This last type of comparison needs 
to be done as objectively as possible, for otherwise the “glam- 
our” of mechanization may cause some organizations to 
switch to new, mechanized methods without knowledge of de- 
ficiencies that may prove to be quite serious. It is quite pos- 
sible that indexing and classification systems for manual use 
could be greatly improved and could provide excellent service 
in many situations. It therefore appears that more funda- 
mental work on principles of and improved systems for or- 
ganizing knowledge that could be used with either manual 
or mechanized techniques would be most aaa 

3. Mechanization of information processing. 
tially mechanized storage and retrieval system are in opera- 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


tion now for a number of relatively small collections of in- 
formation, one immediate research task is to determine 
whether the same procedures will provide effective and effi- 
cient services for much larger collections. The belief that 
many or most of these pverene are not suitable for large 
collections is fairly widespread, so it is important that pro- 
cedures designed to handle very large volumes of information 
be developed and tested. 

In this area, the Foundation is supporting at Western Re- 
serve University a large-scale test program involving infor- 
mation extending into several disciplines. At Chemical Ab- 
stracts Service, three grants have been made for projects lead- 
ing toward mechanization of the processing and searching 
of chemical information. 

If mechanization is to be extended beyond the more routine 
information processing steps, it is clear that more work to- 
ward mechanizable procedures for dealing with language is 
essential, for the language of documents is the raw material 
of any storage and retrieval system and, of course, of any 
translation system. It is also clear that mechanized systems 
for searching or translating information should be capable 
ultimately of maintaining records of their own perform- 
ance and updating and improving their own procedures as a 
result of this accumulated experience. Research is under- 
way in all of these areas, but we are still only in the initial 
stages of this new and complex field of mechanized informa- 
tion processing. Much greater research efforts will be needed 
for some years to come if mechanization on a large scale is to 
be achieved and employed in the service of science. 


PROBLEMS AT THE SOURCE 


The information problem obviously begins to take shape 
at the point where information is generated, in research and 
development establishments. These include Government 
agencies and contractors, industrial companies, and educa- 
tional and other nonprofit research institutions. The great 
influence of the source of scientific and technical information 
is frequently overlooked or underestimated, but it needs only 
to be pointed out that every source of scientific information 
is in turn a user of scientific information. 

Government reports 
Creation by the left hand of problems to be solved by the 
right is particularly evident in the area of “unpublished” re- 
search reports. The report is an increasingly popular form 
for the first appearance of research results. Especially in the 
United States the report has achieved stature as a primary 
medium for dissemination of new information. Many re- 
search scientists and engineers seem to consider the report a 
boon to the producer of information and a curse to the user, 
even though a given individual is often both. Regardless of 
the acceptance of reports by producers of research results, 
especially those within Government or Government-contrac- 


111 








112 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


tor establishments, in most cases, from the point of view of 
the scientific community as a whole, the information in a re- 
port is doomed to a relatively ineffective existence unless it 
subsequently and promptly appears in a conventional pub- 
lication. 

Because this review is concerned primarily with the flow of 
information through the scientific community generally, a 
critical attitude toward the report as a medium of dissemi- 
nation must be seen in that light. Within a small segment of 
the community, such as one company, or a Government 
agency or research contractor, the report is the obvious de- 
vice for informing the sponsor of the research of the results 
In other words, reports are vital parts of the communica- 
tions process, within the limits of the restricted audience 
which has, in effect, a proprietary right tothem. What needs 
to be questioned, however, is the tendency to consider the dis- 
semination task ended with the issuance of a report. 

The report problem has been approached from two oppos- 
ing directions at once. First, an interagency cooperative pro- 
gram is assuring that the maximum number of reports 
prepared with public funds is called to the attention of and 
made available to the public. The second attack is on the 
information itself, to guide as much of it.as possible into more 
conventional information channels. 

Broad interagency awareness of the program is maintained 
through the Federal Advisory Committee on Scientific In- 
formation, composed of senior members of Federal agencies 
with large science information activities. This committee is 
chaired by the head of the Office of Science Information 
Service of the Nationa] Science Foundation. The Founda- 
tion also acts to coordinate activities of the Office of Techni- 
cal Services, Department of Commerce; the Science and 
Technology Division of the Library of Congress; and the 
Armed Services Technical Information Agency as the prin- 
cipal agents of the Government report dissemination pro- 
gram. The Atomic Energy Commission, a major report 
producer, utilizes OTS as its public disseminator as do other 
agencies, 

The OTS has doubled in the past 2 years the number of 
unclassified reports which it announces and makes available 
by sale to the scientific community and general public. Most 
of the increase has consisted of basic research reports, a type 
not usually included in earlier OTS announcement media. 
The Science and Technology Division of the Library of Con- 
gress has expanded and organized its catalogs and other 
bibliographic records of Government reports. Its Reports 
Reference Center is, therefore, equipped to perform more 
comprehensive literature searches and to provide a higher 
quality of reference service for a continually increasing num- 
ber of users. In addition, ASTIA and OTS are jointly 
studying means for more effective cooperation between them 
as the largest reports handling agencies of Government. 

These are the kinds of activities, stimulated and supported 
by the Foundation, which have been developed to make un- 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


published information more accessible to those who need it. 
In addition, the Office of Science Information Service has 
been preparing and publishing a series of bulletins entitled 
“Scientific Information Activities of Federal Agencies.” 
These bulletins are in addition to the principal announce- 
ment medium, U.S. Government Research Reports, the 
monthly OTS journal. The bulletins identify in detail the 
areas of scientific research and development of individual 
agencies, the availability of information about research in 
process, the types of information generated by agencies, the 
scientific publications produced, how they are announced, and 
means for obtaining them. Scientists, librarians, and other 
information users are thus aided in locating and obtaining 
“unpublished” Government reports. Bulletins have been 
published through the Government Printing Office on U.S. 
Department of oriodlinane, U.S. Department of the Navy, 
U.S. Department of Commerce—Part I, U.S. Government 
Printing Office and others are in various stages of prepara- 
tion or publication. 

However beneficial these actions may prove, they are only 
steps in the right direction on what is still a long road. 
Eventually a single medium may be needed to announce all 
Government research reports. Time and study may also re- 
veal more clearly the need, feasibility and value of a single 
regional depository system for all Government scientific re- 

orts to replace the many depository systems now operating. 
Study of the economics of primary versus secondary report 
reproduction and distribution is also planned. For example, 
it would appear advantageous for all agencies which gen- 
erate reports to produce and distribute optimum quantities 
initially in order to minimize the delays and additional costs 
associated with making and filling secondary requests on a 
special basis. 

Standardized and streamlined systems of classes of re- 
ports, code designations and format are urged by reports 
users. Effective retrieval of reports as reference material 
is seriously hampered because of the wide variety of labels 
and codes which they bear and which agencies attach to their 
reports in great. but often meaningless variety. 

hese kinds of problems can best be attacked and solved 
by Federal agencies acting cooperatively. In such cases, the 
Foimdation’s Office of Science Information Service acts to 
identify deficiencies in current practices and to bring the 
concerned agencies together to coordinate Government in- 
formation programs for maximum effectiveness. 

Although reports have many shortcomings as primary in- 
formation media, their virtues of speed, economy and ease of 
creation and revision keep the reports system alive, especially 
in Government research and development programs. Since 
reports exist, they must be dealt with and their accessibility 
improved. It is generally agreed, however, that in the long 
run the scientific community will be best served by encourag- 
ing the flow of unpublished information into conventional 
scientific publication channels. 


113 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


Non-Government reports 

Considerable emphasis has been placed on Government 
reports problems because the Government and its contractors 
are the leading producer of reports. But the “unpublished” 
problem does not end there. 

The company-published industrial report is of virtually 
no informational significance to the scientific world at. large 
unless copies are distributed and the information is an- 
nounced and made widely available. Such reports are often, 
if not usually, proprietary and remain so at the company’s 
discretion, completely in keeping with the principles of the 
private enterprise system of the United States. Attempts 
are now being made in the Office of Science Information 
Service to assess this problem more precisely and to propose 
remedies to whatever meaningful problems may be defined. 
The extent to which the information in company reports be- 
comes publicly available is the subject of an OSIS study now 
in process. 

The university or research institute report which is not. pre- 
sented to a meeting or submitted to a standard scientific jour- 
nal may sometimes be incorporated in a bulletin of a univer- 
sity experiment station or the special journal of a company or 
research institute. Nonconformity again may cripple such 
documents because they often creep into being in short sup- 
ply off the beaten track of most information gathering indi- 

viduals or organizations. 

There are noteworthy exceptions to the general restrictions 
on accessibility of information in reports, university bulletins, 
and institute journals. Among the outstanding exceptions 
are the four-part Journal of Research of the National Bureau 

of Standards, the Bell System Technical Journal and the 
Battelle Technical Review. 
‘Scientific meetings 

Oral reports at scientific meetings, conferences, or symposia 
may become much more widely known than the reports from 
which they stemmed, Abstracts of such reports are often 
freely published before or after meetings. Many such papers 
are subsequently published in primary journals or in proceed- 
ings volumes but only after lengthy delays, The Foundation 
is also supporting studies to determine the extent to which in- 
formation reported orally at meetings escapes the permanent 
publications record and the reasons therefor. 

Because scientific meetings in themselves are important 
channels for disseminating information, the Foundation sup- 
ports. publication by the Library of Congress of a list of fu- 
ture international and foreign scientific meetings. It is as- 
sumed that such meetings are less widely known than are 
purely domestic meetings. The Library concentrates on ob- 
taining for its collections all available documentation about 


the meetings, including programs, preprints, abstracts of pa- 
pers, and published proceedings. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


IV 
PRIMARY AND SECONDARY PUBLICATIONS 


Although primary and secondary publications are distinct- 
ly different media in almost all respects except that they con- 
tain scientific and technical information and are usually 
published periodically, these basic types of information media 
are so closely associated and so interdependent they will be 
discussed together. They constitute the backbone of the 
worldwide system of information dissemination, storage, and 
retrieval which is conventionally referred to as publication. 
The principal elements of the system, as reflected in part in 
figure 2, are— 

1. Primary journals—the principal media in which scien- 
tists first report the results of their research. In the basic 
sciences, most of these periodicals are published by scientific 
societies ; in the applied fields, many are commercial journals, 
Solving the basic scientific information problem is compli- 
cated by the fact that published papers, in addition to being a 
medium for disseminating Mailed. have become the major 
yardstick by which a scientist’s professional stature is judged. 

2. Abstracts—a very important form of secondary publica- 
tion, when issued separately from the parent papers. 
Usually they are published as abstracting journals but some 
are card services. When read currently, abstracts assist a 
scientist in keeping up with his particular fields of interest ; 
as compilations, if adequately indexed, they greatly facilitate 
retrospective search of the literature. 

3. Indexes—another form of secondary publication, serv- 
ing the user as a most important tool for searching the liter- 
ature. 

4. Bibliographies and other compilations—still other types 
of secondary publication, whose natures are well known. 

5. Reviews—state-of-the-art papers or monographs which 
ordinarily summarize and sometimes evaluate progress up to 
a certain point and thus provide the research scientist with 
new points of reference or a useful means of entry into a new 
research topic. 

6. Books—monographic treatments which may be either 
primary or secondary publications; the former, compared 
with primary journals, are usually less current, but more 
reliable, authoritative, and comprehensive. 

This six-part publication system evolved over a period of 
many Secntont has been understood and generally well liked 
by scientists, and, for the most part has served them reason- 
ably well. Its very nature has kept quality control responsi- 
bility largely in the scientists’ own hands. For years, its 
capabilities for recording and disseminating information 
about. kept pace with the world’s output of new scientific 
knowledge. Up to about the time of World War IT, research 
and publication appeared to be substantially in equilibrium. 








115 








116 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Expansion of research and its publication 


Disruption of the fairly even state of equilibrium between 
research and publication, ‘and the resulting complex and diffi- 
cult information situation that now faces us, has stemmed 
chiefly from the spectacular and still accelerating rate of 
growth in science and technology and the resulting increase 
in the quantity of scientific knowledge that must be con- 
trolled bibliographically if we are even to come close to find- 
ing a satisfactory answer to the question, posed earlier : “How 
can all results of all scientific research be made readily avail- 
able to all scientists?” Two or three examples serve to illus- 
trate the magnitude of the burden which publication is 
attempting to carry. The science and technology collections 
of the Library of Congress have doubled approximately every 
20 years for the past century and now number more than 1.5 
million volumes of books and periodicals, Statistics from 
other sources indicate that research literature doubles every 
81% years. Figures relating to the growth of scientific jour- 
nals in all disciplines reveal that the 1924 edition of the 
World List of Scientific Periodicals cited about 24,000 titles, 
the 1952 edition gave 50,000 and it is estimated the total will 
reach 100,000 by 1979. 


Increased costs of publication 


The serious, and in some cases almost paralyzing, effects of 
rising costs of controlling scientific information are another 
factor closely related to the growth in the quantity of litera- 
ture. Part of this problem results simply from the fact that 
costs for editing, composing, printing, and distributing publi- 
cations have behaved in recent years like the costs of every- 
thing else. Important also, however, has been widespread 
lack of recognition of the principle that research is not com- 
plete until its results have been recorded in a manner that 
will make and keep them readily available. Vastly increased 
support for the laboratory phase of scientific research has 
seldom been accompanied by corresponding provision for 
insuring the dissemination and continuing availability of the 
knowledge which is the principle product of this experimen- 
tation. Had such provision been made as a routine part of 
every research project, the scientific information problem 
today would be very much less serious than it is. Efficient 
and adequately financed reports-producing units, when they 
exist, merely beg the question. 


Interdisciplinary publication problems 


One important contributor to the seriousness of the scien- 
tific information problem is a factor that is inherent in the 
advance of scientific knowledge and does not stem primarily 
from the rapid growth of the literature, although unquestion- 
ably this has aggravated its effect. This factor is the in- 
creased, and increasing, extent to which a productive research 
scientist in any given field must be familiar with develop- 
ments in other fields. The conventional system of subject 
areas or disciplines is, of course, a strictly manmade break- 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


down—an arbitrary pattern of pigeonholes which has been 
most useful but is in no sense preordained. Primary jour- 
nals, abstracting and indexing services, bibliographies—in- 
deed, all elements of the standard bibliographic control appa- 
ratus—have been established within and are keyed to this 
pattern of pigeonholes. But nature, as it were, is uncon- 
cerned with manmade subject divisions, and scientific phe- 
nomena occur as they will without giving any prearranged 
scheme the least consideration. As man’s knowledge of 
science and its phenomena grows, every scientist finds his 
research interests encompassing more and more specialized 
subject areas; also, increasingly, new subjects have to be in- 
serted between or in the middle of old ones. Witness, for 
example, the rapid growth in importance in recent years of 
such Kelas as biophysics, biochemistry, geophysics, geochem- 
istry, chemical physics, physical chemistry, and the like. 
Bibliographic tools have to meet greatly augmented needs to 
satisfy today’s scientist in this new environment. Primary 
journals must be aimed at audiences of greater heterogeneity, 
abstracts must serve much broader groups of readers, indexes 
must be expanded and made responsive to the requirements 
of scientists in many fields in stead of just one or two, and so 
forth. The virtues and the faults of “subject splintering” in 
science are reflected in publications where “splintering” is 
greeted with mixtures of enthusiasm and distaste. 


Publication problems of primary journals 


On the dissemination side, primary journals have been un- 
able to expand rapidly enough to keep up with the increas- 
ing flow of papers. As a result, many have accumulated 
large arrearages of editorially accepted but unpublished 
manuscripts, publication delays have occurred, and indexes 
have fallen behind or been omitted. Financial problems 
have become increasingly severe as rising subscription rates 
and society dues approach the point of Giminishing returns. 
Expanding interdisciplinary interests have complicated the 
problem of where a given paper should be published, with 
several journals in santo different fields frequently ap- 
pearing almost equally appropriate. Also, in scientific fields 
where the material to be published involves complex mathe- 
matical expressions, competent typesetting facilities have be- 
come saturated. 


Problems of secondary publications 


Abstracting and indexing services, for which potential 
workload is determined not by themselves but by the primary 
journals, are in a similar but. probably more difficult situ- 
ation. In addition to being faced simultaneously with more 
and more material to be covered and higher and higher costs 
of covering it, indexing services find their task becoming in- 
creasingly complex. The more abstracts a service publishes, 
the more elaborate its indexes must become; the more nearly 
complete its coverage becomes, the greater are its problems 
of acquisition, translation, and the like; as a field of science 


117 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


becomes more complex, indexing its abstracted publications 
becomes more complex. The financial difficulties of these 
services are aggravated by the fact that they have fewer 
sources of income available to them than do primary jour- 
nals. Some have remained solvent simply by letting their 
current budgets control the extent to which they cover their 
fields—economically sound, perhaps, but obviously not an 
adequate solution from the standpoint of the promotion of 
scientific progress. Others, in attempting to be comprehen- 
sive, have found themselves in serious financial difficulties. 
Similar problems of coverage, complexity, and cost face the 
other secondary publication media. 

Improvement of U.S. abstracting and indexing services 

As the number and content of primary journals increases, 
the role of the abstracting and indexing services becomes ever 
more important. Scientists in most. disciplines depend upon 
abstract journals to keep up with their particular fields of 
interest. If adequately indexed, abstract journals greatly 
facilitate retrospective search of the literature. 

The Foundation’s responsibility to coordinate national 
scientific information activities necessitates conduct of a 
planned program to improve the abstracting and indexing 
of scientific literature in the United States. In carrying out 
this program, grants are awarded (1) to allow expansion of 
coverage by present services to match the increased output 
of primary journals, (2) to establish new services in subject 
fields not now covered, (3) to publish backlogs of material, 
or (4) to provide temporary support for abstracting or index- 
ing services having financial difficulties. In addition, steps 
are taken to coordinate the efforts of the various services. 
For example, the Foundation helped to establish the Na- 
tional Federation of Science abstracting and indexing serv- 
ices, which strive to coordinate and improve the various 
services. 

The 15 abstracting and indexing services constituting the 
present membership “of the feder ation, listed on the attached 

table (app. E), include most but not all of the U.S. services 

which attempt to cover all the literature of a major scien- 
tific discipline or subject area. The Foundation has been 
meeting and working with many of these and others of the 
more specialized services to stimulate and assist them, where 
necessary, to improve services to scientists in their fields. It 
is to the credit of the abstracting services that they for the 
most part have taken up the challenge of the expanding vol- 
ume of literature in their fields and have greatly improved 
their services. 

The table gives the numbers of abstracts or title listings 
published by the members of the National Federation of 
Science Abstracting and Indexing Services during calendar 
years 1958 and 1959, and the numbers they expect ‘to publish 
in 1960. Although there is a good deal of individual varia- 
tion, the grant total estimated for 1960 represents an overall 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


increase of approximately 19 percent over the number 
published in 1958. 

The differences among the various services on the list are 
well illustrated by a comparison of Chemical Abstracts (CA) 
and Biological Abstracts (BA). The former covers, so far as 
is known, just about all the world’s literature in chemistry. 
Increases in numbers of abstracts published by CA represent 
increases in the world’s published chemical literature. The 
completeness and thoroughness of CA, which is considered 
the paragon of the wor ld’s abstracting services, is due partly 
to the large industrial complex associated with the field of 
chemistry. BA, on the other hand, attempts to cover a field 
almost as voluminous as that of chemistry, but until recently 
BA’s resources could only be stretched to cover about 25 per- 
cent of the world’s biological literature. National Science 
Foundation grants, starting in 1958, are assisting BA to ex- 
pand its coverage faster than the growth of the biological 
literature, and at the present rate BA will have achieved 
almost 50 percent coverage by 1960. Increases in size indi- 
cated for Nuclear Science Abstracts and Index Medicus reflect 
not only a growing literature but also the recent mechaniza- 
tion of certain nonintellectual processes in producing these 
services, which will enable them to process considerably more 
Material. Mathematical Review and Applied Mechanics Re- 
views define their areas of subject coverage rather specifically, 
and the increases in size shown represent primarily gr owth 
of the field and the increased ability to obtain obscure or 
marginal periodicals. The total number of abstracts pub- 
lished by Meteorological Abstract and Bibliography depends 
partly on its success in obtaining funds. This service has 
been accumulating a backlog of unpublished abstracts, and 
hopes to obtain funds to publish this accumulation in 1960. 
In addition, Meteorological Abstracts is planning to expand 
its scope of coverage in the near future to include the bulk 
of geophysics 

Under a National Science Foundation grant, the Library 
of Congress has been compiling a bibliogr aphy of U.S. ab- 
stracting and indexing services, in the course of which they 
obtained information on 900 publications more or less of this 
type. The final bibliography will include about 450 scientific 
abstracting and indexing services. Among these, for ex- 
ample, is the new Geoscience Abstracts, which was begun in 
1959 as a successor to the Geological Abstracts. The new 
service published 3,202 abstracts in 1959 and expects to pub- 
lish 3,500 in 1960, whereas Geological Abstracts in 1958 
published only 1,559 abstracts. The American Petroleum 
Institute’s Technical Abstracts covers the refining and pet- 
rochemical phases of petroleum science and technology. Es- 
tablished in 1954, it now publishes about 10,500 abstracts per 
year. Plans are underway to initiate coverage of the litera- 
ture on petroleum exploration and production as well, either 
by expansion of the existing service or by establishing a new 
one. 


119 





120 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


The dilemma of the research scientist 


The research scientist—the ultimate consumer of scientific 
information—finds himself running faster and faster in his 
attempts to stay in the same place. To the extent that pri- 
mary and secondary publications expand their services, he 
has more primary journal papers to read, more abstracts and 
index entries to check, more bibliographies to look through, 
and so forth. This would have posed a serious problem for 
him even if the scope of his research interests, in terms of 
conventional disciplines, had remained about the same. But, 
as already noted, it has not. More and more he is finding 
that discoveries and developments of great potential perti- 
nence to his work may turn up in research areas far removed 
from his specialty. Two examples illustrate this point. 

A Bureau of Mines announcement of some months ago 
dealt with the extraction of metals, like copper and manga- 
nese, from low-grade ores by bacterial action. Microbiology 
and biochemistry would not seem to be high priority research 
areas for mining and metallurgy scientists to follow. Yet, it 
was research in these fields that brought to light a kind of 
bacterial action which offered a completely new approach to 
the recovery of metals from low-concentration ores. .A sec- 
ond example, which goes back some years, concerns the use 
of chromatography for separating plant materials. The 
basic research was performed by the botanist Tswett in 1906, 
and he, quite naturally and properly, published his findings 
in a botanical journal. It was not until 25 years later that 
Tswett’s work came to the attention of chemists. When in 
the 1930’s, the process was applied commercially to the sep- 
aration of plant pigments, it very quickly became an ex- 
tremely valuable chemical tool. 

Thus, while on the one hand those directly responsible for 
the various forms in which scientific information is published 
quite properly strive for more complete coverage of research, 
the scientists for whom the information is being published 
are increasingly swamped. (Aspects of this problem per- 
taining to information requirements and storage and search 
are discussed in the section, “Research on Scientific Informa- 
tion Problems.”) 


Some approaches to solutions 


The situation outlined above has developed over a period 
of time during which the growing problems of scientific pub- 
lication not only were viewed with concern but were attacked 
on a number of fronts—albeit. until recent years on a limited 
scale and with insufficient coordination. Most significant of 
the recent activity is increasing recognition of the informa- 
tion problem by the professional societies and other major 
groups in the various scientific fields. The most encouraging 
aspect has been the willingness of the groups involved to go 
back to first principles and to think the information and pub- 
lication problems through from the beginning, as nearly as 
possible without prejudice for or against the particular pub- 





—_ Stl ee SS ll E 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


lication pattern that happens to have eee To this 
end, to name examples, extensive studies of physics and 
biology documentation are underway by the American Insti- 
tute of Physics and the American Institute of Biological 
Sciences. ‘These projects are discussed in detail in the sec- 
tion, “Research on Scientific Information Problems.” 

Related kinds of investigation are being made for chem- 
istry by Chemical Abstracts and other parts of the American 
Chemical Society ; the pharmaceutical industry has set up an 
information study group for its field; a continuing Con- 
ference of Biological Editors (CBE) has been organized 
and is investigating specific problems associated with bio- 
logical journals; and so forth. Many of these activities were 
initiated at the suggestion of the National Science Founda- 
tion, under its responsibilities as defined in the National De- 
fense Education Act, and most of them are supported wholly 
or partially by the Foundation. The questions to which an- 
swers are bolas sought in these programs are fundamental 
and might be stated as follows: 

1. Howcan primary journals obtain more significant papers 
and eliminate unimportant ones ? 

2. Exactly what are the scientists’ needs in scientific pub- 
lication and what kind of system will best meet these? This 
question is closely associated with data and reference centers 
and documentation research discussed elsewhere in this paper. 

3. What pattern of cooperative operation among primary 
journals and abstracting and indexing services will give the 
scientist maximum service at minimum cost in time and 
money ¢ 

4, What economies might be achieved, without seriously 
affecting quality of service, by the use of new techniques 
in compres printing, indexing, distribution, and the like? 

5. What are promising areas in which research ought to 
be conducted looking toward the development of still other 
techniques in these fields that would lower cost and/or im- 
prove effectiveness. 

Short range remedial measures, in contrast to the long 
range attacks outlined above, tend to be palliative rather 
than curative. They are essential, however, to maintain pres- 
ent publication services in critical areas. The short range 
aspect of the Foundation’s publications program includes 
grants to enable primary and secondary journals to weather 
temporary financial crises, for experimenting with new types 
of publication, for initiating new research journals, for pub- 
lishing significant scientific monographs that are not com- 
mercially attractive, and for preparing and publishing special 
indexes, bibliographies, and reviews. Also being conducted 
or supported are studies whose results will be beneficial both 
to the Foundation in making its publication grant program 
effective and to scientific societies and journal managers in 
planning their activities. These concern such things as so- 
ciety dues structure; editorial, publication, and subscription 
policies of journals; publication climate and practices in re- 


121 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


search laboratories; relation of Federal support of research to 
Federal support of publication ; and the like. 

Steps which various scientific societies and other segments 
of the research community have themselves taken or are tak- 
ing include the obvious one of raising dues and subscriptions 
(a measure that can only be carried so far) ; exploiting more 
fully other sources of income (page charges, for example) ; 
requiring severe condensation of papers (excellent, if not car- 
ried to the point of elimination of useful information) ; intro- 
ducing lower cost. processes in some areas; and the like. 

All of these programs and projects—both the long range 
and the short range—indicate progress in the attack on the 
scientific publications problem. Perhaps most important, 
however, are the signs of a growing awareness that. the prob- 
lem is serious and must be solved if the Nation’s high place 
in the scientific world is to remain secure. 


DATA AND REFERENCE SERVICES 


A large percentage of the significant unclassified scientific 
and technical information resulting from the research of 
Government industrial, educational, and other private or- 
ganizations is not made available through conventional jour- 
nal and book publication channels. Even though such infor- 
mation may eventually become published, there are extensive 
time lags, varying from months to years, between the time 
research is completed, the time the results are submitted for 
publication, and the time the information actually appears in 
a scientific journal or book. 

The program of the Foundation in this area has two major 
objectives: (1) to provide for systematic optimum public 
announcement and dissemination of all significant unclassi- 
fied scientific and technological information which is not 
published promptly in scientific journals and books, and to 
encourage its flow into conventional scientific public: ation 
channels; and (2) to help make scientific and technical infor- 
mation and data more easily accessible to the Nation’s scien- 
tific and technological community by means of a well coordi- 
nated system of conventional and specialized reference, data, 
and information service centers. 

The first area of activity outlined above is discussed in de- 
tail in section ITI, “Problems at the Source.” The second 
area of data and information centers is considered here. 


Specialized data and information centers 


In the last decade, several thousands of specialized data 
and information centers have developed in different parts of 
the world, including an estimated 3,000 in the U Tnited States. 


The specialized information center is organized to pro- 
vide services over narrow segments of particular disciplines 
and technologies, and sometimes including critically eval- 
uated data. Illustrative of this widespread development in 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


scientific information service are the specialized and fre- 
quently elaborate scientific and technical information groups 
which provide for the internal services of large industrial 
corporations, especially in the chemical, oil, and electrical 
industries. Many specialized information centers have also 
been established for support of Government research and de- 
velopment contractor activities; for example, there are more 
than 50 known such centers operated or supported by the 
U.S. Department of Defense. These include the Defense 
Metals Information Center, Plastics Technical Evaluation 
Center, Prevention of Deterioration Center, Solid Propel- 
lants Information Agency, Radiation Effects Information 
Center, and others. 

It is important to note that the effectiveness of specialized 
information centers depends in large part upon the prompt- 
ness and comprehensive coverage of F the basic abstracting and 
indexing services, which prov ide the raw material for their 
special ized operations. 

Needs of scientists for specialized services 

The requirements of scientists for specialized information 
services vary greatly, depending upon the orientation of the 
user’s work. For ex: imple, much of the information needed 
by the scientist in basic research is likely to be found within 
one of the conventional disciplines—or major subdisciplines 
such as biochemistry—and the degree to which his needs are 
satisfied is a measure of the scope and quality of the informa- 
tion services within his discipline, services generally devel- 
oped cooperatively by the members of his particular profes- 
sion. On the other hand, the research worker in product 
development is likely to have more interdisciplinary infor- 
mation needs which are more difficult to satisfy. In the latter 

‘ase, if the particular activity is broad in scope, occupying 
large numbers of geographically scattered scientists and tech- 
nologists, effective dissemination of information becomes an 
especially acute problem. This group ‘is dependent not only 
upon effective information services in the fundamental dis- 
ciplines, but also upon an equally effective secondary system 
which must draw pertinent information from various disci- 
plines, reprocess it, and combine it with information gener- 
ated by the group itself. The very specialized orientation of 
these secondary systems is further emphasized by the fact 
that normal information processing is frequently supple- 
mented with continuing technical evaluation and state-of- 
the-art appraisals by scientists and engineers working di- 
rectly in the program or project. It is this type of situation 
which has in recent years encouraged the somewhat spon- 
taneous and widespread growth of specialized information 
centers both within the Government and among industrial 
and other private organizations. 

Still another type of need which can be met by the tech- 
nique of the data and reference service is that for information 
concerning current or proposed research. Effective planning 
and coordination of scientific research programs depend on 


123 





124 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


prompt, accurate, and complete knowledge of who is going or 
proposes to do what woth; and where. For example, the 
present Bio-Sciences Information Exchange, supported by 
several Federal agencies and administered by the Smith- 
sonian Institution, provides this type of specialized service 
in the biological and medical sciences. Personal communica- 
tion and prompt exchange of information between research 
investigators are also promoted by this type of service. 

There can also be a geographical aspect to the need for in- 
formation, thus creating need for specialized information 
services. If a body of information users shares scientific in- 
terests in a particular geographical area—for example, the 
Pacific Ocean and its islands, the Antarctic regions, polar 
regions, or the like—then interdisciplinary data and refer- 
ence services are required of the specialized, secondary type 
previously mentioned. Users may also be physically located 
in a common geographical area, such as a municipality, State, 
or region, but without related subject interests. 


‘Mechanized systems 


Many different storage and retrieval systems are already 
being utilized to produce the varied kinds of products de- 
manded of data and reference services—literature citations, 
data or narrative answers to specific questions, information 
from one discipline or from many, information about a 
region or for a region, etc. Many such systems are mecha- 
nized, and extensive study and research are being directed 
toward improved methods of mechanization so that these 


data and reference services can most effectively and economi- 
cally satisfy the information requirements of their respective 
bodies of users. These aspects of the problem—systems de- 
velopment and mechanization research—are discussed in 
more detail in section II on “Research in Scientific Infor- 
mation Problems.” 


Problems of coordination of services 


Aside from systems and mechanization research, much 
needs to be done to integrate existing and new data and ref- 
erence services into an effective national instrument. Un- 
necessary duplication must be eliminated. Existing services 
must be reexamined and possibly reoriented and expanded 
on the basis of informed appraisals of probable future in- 
formation requirements. Existing facilities must be widely 
publicized and fully utilized to insure against the establish- 
ment of new services which inadvertently duplicate or over- 
lap. New services should be developed where there is firm 
evidence that substantial user requirements are being neg- 
lected. Overall, there must be emphasis on maximum com- 
patibility among data and reference services—large and 
small, general and specialized, national and local. 


The role of the Government in improvement of services 


The relative financial responsibilities of the Federal Gov- 
ernment and of private industry for support of data and ref- 
erence services needs to be explored further. It is clear that 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


the Government alone cannot underwrite all essential or 
worthwhile services of this type. The Foundation follows 
the general principle that, the primary responsibility for 
their long-term financial support rests with the users and 
beneficiaries of such services. The limited funds of the 
Foundation can be used most effectively in this area as “seed” 
money to stimulate the development of new or modified sci- 
entific information products, systems, and services; only tem- 
porary assistance should be provided to estiblished opera- 
tions where the need for such assistance can be clearly 
demonstrated and is considered serious. 

Because of the differing types of requirements for data and 
reference services and the various economic aspects of the 
problem, the Foundation places different degrees of emphasis 
on its efforts to improve the situation. For example, con- 
sistent with its own long-range views and the recommenda- 
tions of the Science Information Council, top priority is 
assigned to the improvement of information services in the 
broad disciplines and technologies. Since the success of 
specialized or regional services depends upon the effectiveness 
of the fundamental scientific information structure, improve- 
ment in the latter area is of paramount importance. Founda- 
tion-supported programs by Chemical Abstracts, the Ameri- 
can Institute of Physics, and the American Institute of 
Biological Sciences have been mentioned, as well as the 
program of the American Society for Metals for development 
of an improved information service in metallurgy and related 
fields. The need for permanent and sapeowel ‘geophysical 
data and reference services is also being examined. On 
narrower fronts, consideration is being given, for example, 
to a national information service on fungus cultures. In the 
area of critically evaluated data, the Foundation su ports 
the coordinating activities of the Office of Critical Tables of 
the National Academy of Sciences and is appraising specifi- 
cally the crystallographic data situation. Interest is being 
stimulated wherever possible among the disciplines and tech- 
nologies to reappraise their user information requirements 
from a long-range viewpoint and to develop new and im- 
proved data and reference services as warranted. 

Although the Foundation does not propose to provide 
financial support for the long-term operation of highly 
specialized data and reference services, it does consider fi- 
nancial support of ae study and development in any 
spestinlinedt area if the results are apt to be of general interest 
or of application to other specialized areas. An inventory 
of such specialized services has been contracted for with the 
Battelle Memorial Institute, with a view to publishing a 
national directory of their scope and location. Detailed 
information about such services will be collected and analyzed 
as a preliminary to maintaining an effective national ad- 
visory clearinghouse to serve planners of new data and refer- 
ence services. Both the directory and the clearinghouse are 
expected to be of particular value in the face of mounting 


54122—60——_9 


125 








126 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


pressures for specialized data and reference services asso- 
ciated with defense programs, and in particular in relation to 
current urgent needs for better accessibility and dissemina- 
tion of materials research information. 

With respect to regional data and reference services, the 
most difficult area for clear delination of responsibilities is 
that involving centralized service to research and develop- 
ment facilities within a specific region. In most cases, it is 
clear that the local beneficiaries of a regional service should 
bear the financial burden of developing and operating it. 
How ever, as with specialized subject centers and services, 
certain studies appear to be needed to clar ify the substantive 
and economic aspects of such facilities. Where such study 
promises to be helpful on a broad basis with respect to typical 
regional services, the Foundation will consider giving sup- 
port to the study. 

Science Research Information Eachange 

Priority is also being given to the problem of improved 
information services regarding current and proposed_re- 
search in all scientific areas. The Foundation is leading 
efforts among the Government agencies toward the establish- 
ment of a Science Research Information Exchange to collect, 
correlate and disseminate information and data about re- 
search tasks, publicly or privately supported, in the mathe- 
matical, physical, engineering, life, Me social sciences. Al- 
though focusing initial attention upon federally supported 
research and providing service to Federal agencies, the Ex- 
change would be progressively developed to cover and service 
national scientific research interests. 

The intent of such an information exchange is to provide 
a facility for the collection and correlation of administra- 
tive as well as scientific intelligence about current research 
activities before they have progressed to the stage where re- 
sults are documented, published and disseminated. In this 
respect the scope of service differs completely from con- 
ventional scientific information activities which begin their 
process at the point where accounts of research are first re- 
corded. 

Increased scientific research activity in recent years has 
created a growing need for this unique type of service in all 
areas of scientific research—the life, physical and social 
sciences. Those who plan, coordinate and manage research 
programs need to know who is doing or proposes to do what 
research, to what extent, for how “long and where. This 
knowledge, if promptly available on an up- -to-date basis, 
helps to prevent duplication, overemphasis or serious gaps in 
subject areas of research. It also promotes optimum utiliza- 
tion of research funds and other resources. 

There is a second valuable dividend from this type of 
clearinghouse. It facilitates rapid, personal communica- 
tion between research investigators working in closely re- 
lated subject areas. Additional potential benefits can be 
visualized from such a service after it has become fully oper- 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


ative. For example, the input to the service could be ex- 
panded ultimately to record the scientific publications which 
have resulted from individual research tasks. The prompt 
and complete availability of such information would facili- 
tate a comprehensive announcement service superior to the 
resent means of announcing what new research knowledge 
fon been added to the literature, where and by whom. At 
present, it is difficult to foresee all the long-term potentiali- 
ties of such a clearinghouse system; however, the horizons 
for additional future services and benefits appear now to be 
very broad, if the system to be established is carefully 
planned and kept sufficiently flexible for orderly and maxi- 
mum growth. 

A mechanism of this type already exists for biological and 
medical sciences research in the form of the Bio-Sciences 
Information Exchange, which is cooperatively supported by 
several Federal agencies and adminjstered by the Smith- 
sonian Institution. The services which BSIE has performed 
in its several years of operation demonstrate the needs, feasi- 
bility and value of the information exchange on current re- 
search. Further details on the establishment of the Physical 
Sciences Information Exchange are found in appendix D, 
attached. (See pp. 142-144.) 


VI 
FOREIGN SCIENCE INFORMATION 


Research carried on in other countries is increasing in vol- 
ume and significance and thus is of growing importance to 
American scientists and engineers. However, much of this 
information is not directly available because it is published 
in languages which few American scientists read—Russian, 
Chinese, Japanese, and Eastern European languages. In ad- 
dition, many foreign language journals are not readily avail- 
able in American scientific libraries. 

The language difficulty is reflected in the fact that more 
than one-third of the world’s scientific and technical litera- 
ture is produced in the U.S.S.R., China, and Japan. Statis- 
tics indicate this material can be read by less than 2 percent 
of U.S. scientists and engineers. It has been estimated that 
about 50 percent. of all scientific and technical literature ap- 
pears in English, but at least one-third of the balance of the 
world’s literature appears in unfamiliar language and is a 
closed book to U.S. scientists unless it can Te approached 
through abstracting, indexing, and translating. 

The effectiveness of communication of scientific informa- 
tion with foreign countries is determined by factors of sev- 
eral kinds: 

1. Political, particularly where different limitations are 
imposed by different governments on the export and import 
of publications, visits of scientists, and free correspondence. 

2. Economic, where both the quality and quantity of the 
scientific information output of a country is to a considerable 


127 





128 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


extent determined by its economic status and the level of its 
scientific and technical development. 

3. Cultural, where different cultural and educational pat- 
terns, traditions, and ideologies cause different approaches to 
scientific research and different, inter pretation and manner of 
publication of the obtained results. 

4. Organizational, where the organization of scientific re- 
search, publication, and information services peculiar to each 
individual country determines the pattern of communication 
of scientific information of that country. 

These factors need to be known and understood not only by 
administrators and information specialists, but also by scien- 
tists themselves to enable them to determine (a@) in which 
foreign areas research demands their concentrated attention 
and (b) the most expedient means of communication for their 
use. 

While some of the political and economic efforts on foreign 
scientific research and information are exposed in the daily 
press and various science new journals, the intricacies of par- 
ticular foreign research achievements, publications, and in- 
formation services requires special study. The main objec- 
tives of such studies are— 

(1) to describe the “state of the art” of foreign scien- 
tific research ; 

(2) to examine the communication patterns of foreign 
scientific information ; 

(3) to provide guides to foreign research institutions 
and scientists; and 

(4) to provide guides to foreign publications and in- 
formation services. 

Some universities and research institutions have under- 
taken to explore the more general aspects of foreign re- 
search,including scientific education and the financial, philo- 
sophical, and social implications of research. Most of these 
studies are supported y the National Science Foundation 
or other U.S. Government agencies and are frequently ori- 
entated toward the Soviet Union. 'To assess foreign scien- 
tific information output and information services, the Foun- 
dation is sponsoring several studies of information activities 
in individual countries. Studies of scientific information 
activities in Japan, Poland, and Indonesia are in process. 
Also in preparation is a “Guide to Information Activities in 
International Organizations” and a source file of Soviet 
scientific information. Wherever feasible, reports of these 
studies will be published in order to assist scientists and in- 
formation specialists to develop their own channels of 
communication. 

In addition to the study programs on scientific research 
and information developments abroad, similar studies are 
needed to identify the requirements of U.S. scientists for 
foreign information. The U.S. needs and requirements vary 
from field to field depending on the current U.S. interest in 
the subject area lol consideration and the information 





DOCUMENTATION OF SCIENTIFIC INFORMATION 





habits of the particular communities of scientists. Further- 
more, many of the present practices of disseminating for- 
eign scientific information are still relatively new, and there- 
fore there are as yet no reliable measures of their effectiveness. 

International exchange of scientific information.—Individ- 
ual scientists, scientific societies, and scientific libraries and 
institutions are the traditional agencies for the conduct of 
international exchanges. All countries with major research 
and development activities have hundreds of institutions and 
organizations with scientific and technical publications for 
sale and exchange. 

Historically, the U.S. Government has supported the ex- 
change of scientific information with institutions through- 
out the world. The Smithsonian Institution was created 
“to diffuse knowledge”; its International Exchange Service 
was created to service American scientific institutions inter- 
ested in international exchange. The Library of Congress 
likewise has had a prominent place in the development of 
international exchanges. Long series of executive agree- 
ments for the exchange of Government publications have 
clearly established the intent of successive administrations 
and Congresses. 

Within the Federal Government, some 22,000 exchange 
agreements are in force in the three national libraries— 
the Library of Congress, the National Library of Medi- 
cine, and the National Library of Agriculture. Although 
precise figures for individuals and organizations inside and 
outside the Government are not available, it is known that 
other Government agencies, university and other libraries, 
professional societies, industrial laboratories and individual 
scientists maintain many thousands of other agreements for 
exchange of scientific and technical information with their 
counterparts in all foreign countries. Generally, these agree- 
ments cover the exchange of formal publications such as 
journals and books, but much information is also exchanged 
informally by individual scientists and by laboratories 
through reports, reprints, and correspondence. Significant 
information is exchanged also through the media of inter- 
national scientific meetings, visits by scientists to foreign 
countries, and by exchanges of scientists. 

The United States has achieved two major exchange-of- 
publications agreements with the Soviet Union. The first 
is a statement of principle embodied in the Lacy-Zaroubin 
agreement and carried over into the Thompson-Zhukov 
agreement. This remains an unimplemented statement of 
»olicy. The agreement between the two academies last year 
fie similarly not been implemented. The present level of 
exchange activity is slight compared to the potential im- 
plied in past agreements and slight compared to U.S. ex- 
change with other countries. 

Scientists, individuals, and Government and private insti- 
tutions wishing to engage in the large-scale exchange of 
publications are sometimes handicapped by lack of informa- 





129 





































































































































































130 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


tion concerning their colleagues and counterpart organiza- 
tions and their - public: itions in foreign countries. A further 
complication is that publications of Soviet bloc countries 
are all government-produced and centrally issued, while U.S 
scientific publication is mixed public and private and is high- 
ly decentralized. This fact imposes major difficulties in ‘of- 
fering a quid pro quo exchange of current scientific publica- 
tions between institutions in the two areas. 

There is need to establish an international exchange center 
to function as a clearinghouse for information concerning 
exchanges and to assist in broadening the flow of interna- 
tional exchanges with directory-type information and to 
advise domestic or foreign institutions in the possibilities of 
publication exchange. The Foundation is considering means 
of meeting this need through establishment of an advisory 
center which would facilitate international exchange of scien- 
tific publications and which would also take into account the 
need for multiple copies of foreign publications for distri- 
bution to the research libraries of this country. 

Scientific societies receive and publish communications 
from scientists or from comparable societies in other coun- 
tries, and regular exchange os publications sometimes follows. 
American Mathematical ‘Society exchanges have helped build 
the collections of Brown University, for example. But as 
there are many scientists who cannot lean on personal com- 
munication for international information exchange, there are 
also professional societies which feel they cannot afford to 
transmit their own publications abroad in exchange, nor 
maintain facilities for servicing or storing scientific materials 
they would receive in return. The National Science Founda- 
tion is actively studying this situation and plans to work 
cooperatively with the scientific societies in stimulating and 
expanding society and institutional exchanges. 

A recent development in international communication is 
the increased role played by U.S. business and industry. 
Major American companies have established listening posts 
in various European countries; have entered into information 
exchange and cross-licensing ‘arrangements with European 
firms ; ‘support research in “European universities and re- 

search laboratories; have set up laboratories in European 
countries ; and support the costs of U.S.-manned information 
centers abroad. ‘To the extent that these U.S. business activ- 
ities abroad involve basic research, the information product 
is widely and publicly available. Asthe subject matter tends 
more toward application and development, the information 

roduced becomes more a matter of competitive advantage. 
Bevond its usefulness to the company which paid for the 
information, there are long- range benefits of considerable 
importance in increasing U.S. knowledge and understanding 
of foreign science and technology. 

In the last 10 years UNESCO has been conducting a pro- 
gram designed to improve the institutional exchange of pub- 
lications among all countries. The UNESCO program has 
been modest, not oriented scientifically, and directed toward 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


assistance to underdeveloped countries. The private non- 
profit organization United States Book Exchange, Inc., which 
in 1950 offered great promise for the increase of exchange 
of scientific publications with foreign countries, has been 
used but moderately through the decade. Of particular in- 
terest is its use by such agencies as the International Coop- 
eration Administration and U.S. Information Agency in ex- 
porting surplus American publications in the sciences and 
technologies to the underdeveloped countries. 


Primary publications abroad 


There is general recognition abroad, even in the Iron 
Curtain countries, that their scientific results have a greater 
chance of making an impact on the world if they are pub- 
lished in one of the basic international scientific languages. 
The scientific authorities of Poland, Czechoslovakia, Hun- 
gary, and even of Communist China, have approved the use 
of English as well as French, German, and Russian for those 
of their research publications most likely to be read abroad. 
In opposition to the widespread and increasing acceptance of 
English as the principal language of scientific publication, 
however, there are nationalistic trends such as in India 
where there are pressures to publish in Hindi rather than the 
traditional English. 

Independently of these naturally occurring trends, the 
National Science Foundation is developing, through Public 
Law 480 funding, a program to encourage cooperating coun- 
tries to publish more of their primary scientific journals in 
English. Two countries immediately affected are Poland and 
Y ugoslavi ia. Additionally, the Foundation has taken steps to 
support an extension of the noteworthy Japanese effort to 
publish their scientific journals in English. In summary 
there is a trend in scientific publication toward the universa 
use of English, and the Foundation is concerned with speed- 
ing this trend for the advantages which will accrue to United 
States science. 

International cooperation among abstracting services 


Progress in achieving better international coordiration of 
scientific abstracting services has been slow. The program of 
the Abstracting Board of the International Council of Sci- 
entific Unions “( ICSU), which is supported in part by the 
National Science Foundation, has had a modest success. Of 
more significance for the future are the program proposals 
made by the Foundation to the International Federation for 
Documentation (FID) for more active participation jointly 
with ICSU in the coordination of such services. The first 
step, that of establishing an information clearinghouse con- 
cerned with the characteristics of all national and inter- 
national abstracting systems, is about to be undertaken by 
FID. 

Domestically, the establishment of the National Federation 

of Science Abstracting and Indexing Services, referred to 
earlier in this paper, has been a major step in coordinating 





131 















































































































































































132 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


the efforts of American services. Individual members of this 
federation (e.g., “Chemical Abstracts,” “Biological Ab- 
stracts,” “Meteorological Abstracts”), have been paying par- 
ticular attention to the abstracting of foreign scientific jour- 
nals for U.S. and international use. The Foundation, for 
example, has supported the efforts of “Biological Abstracts” 
to increase substantially its coverage of the Russian scientific 
literature, and has taken every opportunity in discussions 
with foreign scientific groups to promote the flow of informa- 
tion to U.S. scientific abstracting services. A case in point 
was Foundation support of a visit in late 1959 by representa- 
tives of the national federation to the All-Union Institute of 
Scientific and Technical Information, U.S.S.R., which cen- 
tralizes the Soviet abstracting effort. The latter institute 
offered to suppy the federation with English-language ab- 
stracts of all Russian scientific publications, an offer which 
for practical reasons the federation could not accept. 

Improved coordination of U.S. abstracting services in 
parallel with closer association with international interests 
offers a variety of opportunities for future international 
cooperation in abstracting, and these are being pursued 
whenever they arise or can be created. 

Scientific reviews 

Of considerable significance to research workers are inter- 
nationally oriented critical reviews, or state-of-the-art pa- 
pers, which summarize, digest, and evaluate reports of 
research bearing on particular scientific problems. The in- 
creased popularity and commercial success of review litera- 
ture has been a phenomenon of scientific publication over 
the recent past, and the Foundation has taken steps to aid 
this trend. 

The Foundation has, for example, swpported the prepara- 
tion of special reviews on Soviet achievements in several sub- 
ject fields to be published in Annual Reviews. The Founda- 
tion has also sponsored the translation of Soviet reviews 

ublished by the American Institute of Physics as a separate 
journal (Progress of Physical Sciences), and both the Foun- 
dation and the National Institutes of Health have sponsored 
publication of review papers in existing U.S. review jour- 
nals. In addition, the Foundation has undertaken to dis- 
tribute to U.S. scientific societies for possible publication a 
series of 10-year progress reviews on the status of research 
in Communist China, as these appear in the available main- 
land Chinese publications. 


Translations 
Federal agencies and scientific societies have adopted 
translation as a major technique for informing U.S. scien- 


tists about oversea developments. This trend reflects in oot 
the increasing awareness by U.S. scientists that the work of 


Russian scientists has present and cogent value for U.S. re- 
search. Criteria for translation programs should be and 
usually are twofold: (1) the material should have demon- 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


strable scientific value, and (2) the language of publication 
should be one with which American scientists have little or 
no familiarity, including principally the Slavia and oriental 
languages. 

It is impossible to estimate accurately the total amount of 
money expended for translation purposes here and abroad, 
or the amount of translated material produced, but it is 
known that millions of dollars are being invested in systema- 
tic large-scale translation programs as well as in smaller in- 
formal ones, so it is important to analyze these programs 
briefly. 

Two organizations in this country, the Office of Technical 
Services of the Department of Commerce and the Special 
Libraries Association Center at John Crerar Library in Chi- 
cago, jointly hold some 40,000 individual translations and 
translated abstracts of an additional 50,000 articles. This 
material is collected, announced, and made available to the 

ublic through a program assisted by the National Science 
Foowsshition and other agencies. In addition, the Founda- 
tion supports 28 scientific societies and academic institutions 
in the translation of all or of very large parts of 39 primary 
journals and three Russian language abstract journals. The 
National Institutes of Health supports the translation of 9 
journals, U.S. private publishers translate an additional 25 
journals, and the United Kingdom Department of Scientific 
and Industrial Research supports the translation of 10 jour- 
nals—a total of more than 80 different Russian journals 
being rendered into English. In addition, the Foundation 
supports three series of collections of translations of papers 
selected from a range of Soviet journals. 

Commercial translating firms in this country also translate 
upon request for private clientele; U.S. business and industry 
patronize these firms, and in addition many assemble their 
own staffs of professional translators; dual translation ef- 
fort is supported by Government agencies for direct further- 
ance of their own statutory missions; similar translation is 
performed in private laboratories and universities. 

The bulk translation programs, of journals cover to cover, 
represent an attempt to identify the most important foreign 
research before it is published by analyzing the kinds of 
papers which have appeared in each journal nominated for 
this treatment. These journals, together with the transla- 
tion collection centers mentioned above, represent the best 
means developed so far to prevent inadvertent duplication 
of translation effort. As more and more U.S. scientists 
become better acquainted with Russian and other foreign 
research it should become easier to make knowledgeable se- 
lections of papers for translation from the great bulk of for- 
eign publication which appears in unfamiliar languages. 

An important adjunct to the Government effort to utilize 
more effectively the results of foreign research is the program 
of scientific information activities undertaken abroad by 
Federal agencies under Public Law 480 funds. The Foun- 


133 








134 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


dation coordinates and administers the program (sec. 104k) 
under a directive from the Bureau of the Budget and an 
amendment to Executive Order 10560 dated January 15, 
1959. A total of $1,200,000 was appropriated by Congress 
for scientific information activities aac this program in 
fiscal year 1959. The Library of Congress administers a sep- 


. arate program (sec. 104n) on which the Foundation consults 


as to scientific information interests. Although a number of 
serious administrative and procedural difficulties exist, the 
Public Law 480 program offers the possibility of making a 
notable increase in the quantity of foreign information avail- 
able to U.S. scientists. Under Foundation sponsorship, a 
productive program is underway in Israel and new programs 
are starting in Poland and Yugoslavia. 


Appenpix A 


PROJECTS IN THE GENERAL AREA OF MECHANIZED HANDLING 
OF INFORMATION SUPPORTED BY THE NATIONAL SCIENCE 
FOUNDATION 


Projects are listed by institution with the amount of the 
most recent grant, the time period the funds are to cover, and 
a brief description of the project. 

A. Projects directly related to the mechanization of in- 
formation retrieval systems and procedures.—These are con- 
cerned either with work directly on the mechanization of 

»rocedures that can be used in a mechanized system or with 
yasic work with mechanization in mind. 

1. The American Institute of Physics, $29,700 (18 
months) : Study of publication methods and problems i in the 
field of physics. Asa part of the study, the efficiency of per- 
muted indexing which is prepared with IBM equipment is 
being tested. Other areas of mechanization in the publica- 
tions field are also being investigated. 

2. Cambridge Language Research Unit (England), 
$35,650 (1 year) : In conjunction with its research on new 
logicomathematical methods for the analysis of languages for 
machine translation (see item C—2 following) the group has 
experimented with information searching by using thesaurus 
techniques in conjunction with punched cards. 

3. Chemical Abstracts Service: Three grants for projects 
leading toward the mechanization of the processing and 
searching: of chemical information. 

(a) $57,900 (1 year): Study of the semantic content of 
chemical literature with the aim of discov ering how to handle 
semantic relationships in a mechanized searching system. 

(6) $69,800 (1 year) : Investigation of mechanical aids to 
chemical documentation. Processing techniques and equip- 
ment will be reviewed, preparatory to the development and 
experimental testing of applicable systems and methods. 

(c) $150,000 (1 year) : A test of the usefulness to chemists 
of a permuted index of paper titles. Data will be punched on 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


tape and processed by computer. The output will result in 
an index of titles, authors, references, and keywords. 

4. Federation of American Societies for Experimental 
Biology, $15,000 (1 year) : Testing mechanized programing 
of papers given at large scientific meetings. ‘They are sched- 
uled by subject matter, and in such a way as to avoid conflicts 
with topics in other sessions at the meeting. It is hoped 
that the resulting classification of subject matter on punch 
cards will serve to generate a subject index to all papers. 

5. George Washington University, $14,000 (1 year): In- 
vestigation of the use of computers for preparing a coordi- 
nate key word index in tabular form, once the index entries 
have been selected by human indexers. 

6. Itek Corp., $143,000 (21 months) : Research to develop 
basic methods and supporting procedures principally in the 
ee and representation of index data from natural 
anguage expressions in documents. Results are to be ap- 
plicable to operation of high volume, high quality documen- 
tary systems. 

7. National Bureau of Standards, $22,900 (1 year) : Study 
of multiple relations in information retrieval systems. One 
aspect of the project is the study of the problems of system 


design where the tasks of location and manipulation of data 
may be variously assigned to several devices or machines 
operating interdependently. 

8. The New York Botanical Garden, $4,750 (1 year): 
Pilot study of the application of electronic data processing 


devices to plant taxonomy. 

9. University of Pennsylvania, $321,800 (2 years): Re- 
search on the possible application of techniques of linguistic 
analysis to scientific texts, directed toward the development 
of computer programs for analyzing English sentences, and 
indexing and abstracting texts. This work may also have 
application in the field of mechanical translation. 

10. Western Reserve University, $159,200 (1 year): A 
large-scale test program for evaluating the procedures that 
have been developed for the automatic processing and search- 
ing of literature of interest to metallurgists. This experi- 
mental partially mechanized information center will cover 
literature of interest to metallurgists. The program will ulti- 
mately include test searches and related studies designed to 
evaluate the results achieved with the use of mechanized pro- 
cedures. In addition, the user demands by metallurgists 
made on this mechanized searching service will be analyzed. 

B. Projects indirectly related to the mechanized handling 
of information.— 

1. Association of Special Libraries and Information Bu- 
reaux, $16,700 (18 months) : Study of the comparative effi- 
ciency of indexing and classification systems. Under a 
previous NSF grant, 18.000 aeronautical documents were in- 
dexed by each of four different systems. The group is now 
testing these different indexing and classification svstems 
to determine their comparative efficiency. The project is 


135 





136 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


indirectly related to mechanized handling of information, for 
some of the systems at least can be used with machines. 

2. Herner & Co., $11 5700 (1 year): Work on a design of a 
classification system, adaptable to mechanized searching de- 
vices, for atomic energy reports, based on the analysis of 
several thousand reference questions submitted to AEC 
libraries; and comparison of the new system with the subject- 
heading system now in use for AEC materials. 

3. Western Reserve University, $5,600 (1 year) : A test pro- 
gram to evaluate the two most widely used chemical notation 
systems for structural formulas, i.e., systems that might be 
used in printed indexes or mechanized sear ching systems or 
both. The study has concentrated thus far on chemists’ 
ability to work with the systems. 

C. ‘Projects on mechanical translation.—All projects on 
machine translation are also concerned with mechanized 
handling of information and are related to some of the in- 
formation retrieval projects, because in both fields an at- 
tempt is being made to analyze and process language. 

1.U miversity of California, $57,600 (1 year) : Research on 
the machine translation of Russian technical literature in 
the field of biochemistry. 

2. Cambridge Language Research Unit (England) (see 
item A-2 for funds) (1 year): Basic research on new logico- 
mathematical methods for the analysis of languages for 
machine translation. (Partial support from Rome Air De- 
velopment Center.) 

3. Harvard University, $200,000 (1 year): Research on 
automatic translation of Russia into English. (Partial sup- 
port from Rome Air Development Center.) 

4. Massachusetts Institute of Technology, $126,000 (1 
year): Basic research on methods of machine translation, 
with experimentation on German into English. 

D. Clearinghouse and coordinating activities—In order 
to foster cooperation and coordination among researchers 
and laboratories working on related problems, the Founda- 
tion has taken the following actions: 

1. The Research Information Center and Advisory Service 
on Information Processing of the National Bureau of Stand- 
ards, Department of Commerce: On November 17, 1958, the 
Foundation announced the establishment of this Center, to 
be operated jointly by the Foundation and the National 
Bureau of Standards. Support for its first 2 years of opera- 
tion has been provided by the Foundation and the Council on 
Library Resources. The first annual report of the center, 
No. 6617, dated November 17, 1959 (copy enclosed), has been 
published by NBS and may be obtained by writing to the 
Office of the Director, National Bureau of Standar ds, Wash- 
ington 25, D.C. The center staff has undertaken a continuing 
study of available reports and publications about infor mation 
processing and is preparing reviews of progress in particular 
research areas. Other major objectives of the center’s col- 
laborative program include— 





. ss vay - 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


(a) Fostering cooperation and coordination among Fed- 
eral agencies and private foundations supporting research and 
development programs in the field of information processing. 

(6) Collecting in one place current information on re- 
search and development data on methods and equipment for 
the automatic processing of scientific information. 

(c) Providing technical assistance and advisory service as 
requested to Federal agencies and cooperating organizations. 

2. Current Research and Development in Scientific Docu- 
mentation, a Foundation publication issued semiannually as 
a guide to current projects here and abroad: The most recent 
edition (No. 5, NSF-59-54, for sale at GPO for 50 cents) 
reports approximately a hundred projects. 

3. Nonconventional technical information systems in cur- 
rent use, a Foundation series of descriptive reports on techni- 
cal information systems currently in operation which embody 
new principles for the organization of subject matter or em- 
ploy automatic equipment for storage and search: The most 
recent edition (No. 2, NSF-59-49, for sale at GPO for 30 
cents) contains review of 34 systems. 

4. Research conferences and presentations: Another im- 
portant means of fostering cooperation among researchers 
and supporting agencies is the research conference or pres- 
entation. 

(a) The International Conference on Scientific Informa- 
tion of November 1958 in Washington, D.C., was rendered 
support and staff assistance by the Foundation. 

) A working conference of investigators in the field of 
mechanical translation research is planned for July 1960 
with support from the Foundation and the Office of Naval 
Research. 

(c) A continuing series of seminars on individual research 
projects is arranged by the Foundation. 





Appenprx B 


EXAMPLES OF ACTIVITIES OF OTHER GOVERNMENT AGENCIES 
PERTAINING TO MECHANIZED HANDLING OF INFORMATION 


Related research projects are being carried out by other 
Federal agencies or various groups with Federal support. 

A. Information retrieval systems ard procedures.— 

1. The National Bureau of Standards and the U.S. Patent 
Office have undertaken a joint program of research and devel- 
opment concerned with new methods and new equipment that 
can be applied to the Patent Office search problem, and have 
also made contracts with other groups for related projects. 

The Patent Office’s research projects are aimed toward (1) 
construction and use of ae systems in selected areas of 
technology, (2) design of systems of broad applicability, (3) 
development of optimum methods of data analysis and file 
preparation, and (4) utilization of present equipment and ex- 


137 





138 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


ploration into new devices for storage and retrieval. <A nota- 
tion system has been developed for transliterating raw text, 
primarily scientific and technical, into a format suitable for 
data processing. 

The Data Processing Systems Division of the National 
Bureau of Standards is also broadly concerned with research 
and development in varied areas of potential applications of 
automatic data processing techniques, including the areas 
of information storage, search, and retrieval. The Bureau 
is working on the development of broader and more compre- 
hensive search programs, capable of both generic and specific 
searches. The Bureau is also providing pe support for 
research at Massachusetts Institute of Technology on the pos- 
sibility of using natural language for storage and retrieval in 
a mechanized literature-searching system. 

2. The Air Force has contracted for several projects in this 
area: 

(a) Zator Co.: (1) Work on the development of a mathe- 
matical theory of information retrieval, and (2) investigation 
of the use of inductive inference to discover pattern of rela- 
tionships among arrays of symbols. 

(6) Ramo-Wooldridge: Experiments to determine the 
extent to which retrieval efficiency and effectiveness can be 
improved by probabilistic indexing (the assignment of 
weighted index tags indicating their degree of relevance to 
the documents). 

(c) Herner & Co.: Investigation of optimal designs for 
storage and retrieval.systems for specialized bodies of infor- 
mation, with particular emphasis on the use of nonmanipu- 
lative correlative indexing. 

(dz) Lockheed Aircraft Corp.: Research to develop a “nor- 
malized” English which is more amenable to machine ma- 
nipulations than is the natural language. An algebraic rep- 
resentation of syntax has been found which covers a large sub- 
class of English sentences. It appears that it may be possible 
to translate by machine from ordinary language to a “nor- 
malized” language, then perform various operations such as 
collating, indexing, abstracting, and making bibliographies 
without human intervention. 

(e) For several years, Air Force Office of Scientific Re- 
search has supported a number of different projects at Docu- 
mentation, Inc., including the development of the Uniterm 
system of coordinate indexing which is used in a number of 
places in mechanized searching systems. 

3. The Office of Naval Research is supporting theoretical 
work on the mechanization of information retrieval at the 
University of Pennsylvania. 

A good deal of relevant research and development work is 
also being conducted within and sponsored by intelligence 
agencies. However, this work is classified. 

B. Machine translation projects —The Office of Naval Re- 
search, the Air Force, and the Army (including Signal Corps 
and the Army Research Office) as well as the’ intelligence 
community are supporting projects on machine translation. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


The Signal Corps is sponsoring a German-to-English 
machine translation study at the University of Texas. 
Ramo-Wooldridge is working on the translation of Russian 
physics literature, using an IBM 704 computer, with some 
support from the Air Force. The Rand Corp. is conducting 
research on machine translation of Russian physics texts, 
with support from the Air Force. The National Bureau of 
Standards, under Army (Army Research Office and Office of 
Ordnance Research) sponsorship, is investigating a program 
for practical mechanical translation from Russian into Eng- 
lish. The University of Washington is working on machine 
translation of scieut'fic Russian into English, with support 
from Rome Air Development Center. The University of 
Milan (Italy) has a mechanical translation project supported 
by the Air Ferce. At Wayne State University, with support 
from the Office of Naval Research, research on mechanical 
translation of Russian mathematical journals is being carried 
out. The Georgetown University program for ach 
English translation of organic chemistry texts and French- 
English translation of nuclear physics texts is sponsored by 
the Central Intelligence Agency. 

Some of the Air Force support is closely associated with 
the National Science Foundation’s effort in that funds are 
transferred to the Foundation to be used for joint support of 
projects. 

The Foundation holds meetings with the other machine 
translation sponsors to coordinate the interests and programs 
of Federal agencies in the field. 

C. Equipment.—The Air Force is supporting at the Mag- 
navox Co. the development of the “Magnacard” equipment 
and a study to ascertain the feasibility of storing graphic 
information on Magnacards, along with digital information, 
for use in retrieval systems. The Air Force has also sup- 
ported the development of Minicard equipment for retrieval. 
The IBM Special Index Analyzer, which is now on the mar- 
ket, was designed especially for rapid search of term cards; 
its operation is based on a system of continuous, automatic, 
multiple comparison developed by Documentation, Inc., with 
Air Force support. 

Rome Air Diewelogenies Center is supporting development 
at IBM of a high-capacity rapid-access memory device (a 
photoseopic disc) for use in mechanical translation and in 
mechanized information searching systems. 

Intelligent Machines Research Corp. (Alexandria, Va.) 
has developed the print reader MX-—2021 under contract with 
the Air Force. The machine reads typewritten letters, 
numerals, and punctuation marks simultaneously. ‘The out- 
put is punched paper tape. 

Baird-Atomic, Inc., under contract with Rome Air Devel- 
opment Center, has been working on an optical filter print 
reader to obtain an electrical input for a high-speed Russian- 
English translating machine. 

Character recognition devices (reading machines), such as 
are the objective of the two projects mentioned immediately 


139 





140 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


above, will be very important to practical, high-speed in- 
formation retrieval and mechanical translation. 
* * K OK * 

Among operating mechanized information services should 
be mentioned— 

The Armed Services Technical Information Agency of the 
iaeentin’ of Defense. 

A Man-Machine Information Center is operated for ONR 

by Documentation, Inc. It consists of a mechanized index 
of reports and provides bibliographic service to ONR con- 
tractors in a highly specialized field. 
_ Project ECHO, a highly mechanized system for the sub- 
ject and management analysis of Government research and 
development contracts, operated by Documentation, Inc., for 
AFOSR. 

For the Cancer Chemotherapy National Service Center of 
the National Institutes of Health, Documentation, Inc., op- 
erates a mechanized system concerned with chemical-biolog- 
ical test data. 

The Smithsonian Institution operates the Bio-Sciences In- 
formation Exchange, supported by a number of Government 
agencies, to organize and make available information about 
current biological and medical research projects. 

The Cardiovascular Literature Project, operated by the 
National Academy of Sciences-National Research Council 
for the National Heart Institute. 

The Index Medicus of the National Library of Medicine 
and the indexes to Nuclear Science Abstracts, published by the 
Atomic Energy Commission, are both prepared with the aid 
of machines. 


Apprenpix C 


EXAMPLES OF NON-GOVERNMENT ACTIVITIES WHICH ARE DEVELOP- 
ING OR ARE SUPPORTING THE DEVELOPMENT OF MECHANIZED 
SYSTEMS 


A. Information retrieval systems and procedures.— 

1. The Council on Library Resources: 

(a) Ramo-Wooldridge—for an initial study of automatic 
word correlation to suggest ways in which the process of 
subject-analysis may be accomplished mechanically for auto- 
matic indexing. 

(6) National Library of Medicine—for the improvement 
of bibliographic compilation through mechanization with 
specific application to the “Current List of Medical Litera- 
ture.” The results of this project are now evident in the new 
“Index Medicus,” which has replaced the “Current List” and 
which is prepared with the aid of punched card equipment 
and the automatic Listomatic camera. 

(ec) AVCO Corp.—for the development of an experi- 
mental high-density direct-access photostorage and_re- 
trieval system for library materials. The system would in- 





DOCUMENTATION OF SCIENTIFIC INFORMATION 





clude a camera to make microimages of original material, 
direct access photomemory mechanisms for selecting stored 
information and reproducing the content, and storage facili- 
ties for supplying information in electronic or optical form 
to a number of users. 

(d) University of Virginia—for a study of the uses of 
closed-circuit television in a decentralized libeney situation 
(project completed). 

2. casttaen Kodak Co. in cooperation with other organiza- 
tions is testing various systems of indexing and coding on 
Minicard equipment to insure availability of adequate 
mechanization as new indexing systems evolve. 

3. Esso Research & Engineering Co. is working on a sys- 
tem for indexing internal technical reports with an IBM 101 
electric statistical machine. The system is a combination of 
two indexes: an alphabetic subject index in conventional 
printed form and a machine based coordinate index. 

4, FMA, Inc., is conducting research in the field of infor- 
mation storage and retrieval, attempting to increase the effi- 
ciency of the search process and studying the applications 
of equipment under development. 

5. The Human Relations Area Files is exploring the pos- 
sibility of a design for data-handling equipment which is 
adapted to the needs of dealing with the material contained 
in its library of sources on virtually every area of human be- 
havior. This work is being done under a grant from the 
Wenner-Gren Foundation for Anthropological Research. 
Machine programs will be devised to facilitate comparative 
statistical research with large masses of data. 

6. The Itek Corp. is continuing their development of 
equipment and as system designs for the handling of 
documentary information as well as the study of industry 
and government applications requiring the handling of 
graphic information. 

7. International Business Machines is conducting basic 
research in information retrieval and document analysis, di- 
rected toward the design of systems for automatic encoding, 
indexing, and abstracting of machine-readable documents. 
An autoindexing method has been developed for producing 
for a given set of documents a “keyword-in-context index” 
or “KWIC index.” The index is derived from titles, ab- 
stracts, or texts available in machine-readable form by es- 
tablishing a list of keywords and by extracting portions of 
titles or texts containing such keywords as a nucleus. These 
portions are then arranged in alphabetical order of the key- 
words to form the index. 

8. Magnavox Co. is carrying on research to determine the 
best means of applying its machine developments to informa- 
tion storage and retrieval and to develop a mathematical 
structure for information systems. 

9. Planning Research Corp. has developed a design for an 
interrogation system employing a simplified form of Eng- 
lish for use in converting visual (e.g., photographic) data 


54122—60—-10 





141 




















































































































































































142 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


into digital data by a man-machine system. Research is pro- 
gressing on document analysis, aimed at the developed of a 
fully automatic system for indexing and abstracting techni- 

cal articles from machine-readable text. Also under investi- 
pareve is an approach to the potential application of com- 
puter technology to the automatic abstracting of documents. 

10. The System Development Corp. is carrying out methods 
of using general-purpose digital computers in processing ab- 
stracts or document text to make written material more ac- 
cessible to literature searchers. The group is doing research 
on applications of modern data-processing techniques in the 
medical sciences, including the automation of medical data 
recording and handling procedures. The group is also inter- 
ested in the use of computers as an aid in medical diagnosis 
and selection of therapy. 

11. The American Society for Metals is supporting at 
Western Reserve University’s Center for Documentation and 
Communication Research a pilot study of the feasibility of 
searching metallurgical literature by means of computer- 
like devices, and the analysis of scientific and technical ter- 
minology for semantic content, with the ultimate goal of a 
system ‘of coding for eventual machine storage and searching. 

B. Character : recognition.—I BM, Bell Telephone Labora- 
tories, RCA, the Sandia Corp., and the University of Michi- 
gan, are conducting work on character recognition by 
machine. 


Appenprx D 
PHYSICAL SCIENCES INFORMATION EXCHANGE 


In October 1959 the National Science Foundation reported 
to the Federal Council for Science and Technology its intent 
to establish a Physical Sciences Information Exchange. The 
objectives and scope of a service have been informally dis- 
cussed with representatives of other agencies having major re- 
search programs in the physical sciences. Our purpose was 
to get such a facility started as quickly as possible on a limi- 
ted basis. Acc ordingly, the Foundation proposed initially to 
fund the service in order to avoid delays. We estimated that 
operating costs would reach a level of $500,000 in about 3 
years. Informal conversations had also indicated that the 
Smithsonian Institution would be receptive to a proposal to 
administer the service. The Council was asked to endorse the 
proposed action and urge all Federal agencies concerned to 
participate in the clearing-house and utilize it fully when 
established. 

The proposed clearinghouse is to concentrate at first on col- 
lecting and correlating the records of all unclassified, extra- 
mural physical sciences research supported by the Federal 
Government. It is intended that each such grant or contract 
would be reported promptly to the clearinghouse as soon as it 
is formalized. The scope of service is to be expanded grad- 





DOCUMENTATION OF SCIENTIFIC INFORMATION 





ually and as early as possible to encompass intramural Gov- 
ernment. research, proposals for Government support, classi- 
fied research tasks and privately supported research. At this 
stage, no plans are included for records of foreign research 
supported by foreign funds. It appears that such coverage, 
if desirable, will be | possible only when better conditions exist 
for the internationl exchange of information which will be 
necessary. However, it is “intended that the service will 
include all U.S S.-supported research in foreign countries. 
Clearinghouse service is to be provided without ; charge to all 
Government agencies and committees with responsibilities 
involving the physical sciences, as well as participating pri- 
vate organizations. In brief, our objective is a facility 
oriented toward ultimate national participation and service. 

The emphasis on Government activities in the physical 
sciences has made it most desirable to establish within the 
Government structure a responsible office for administration, 

continuing interagency liaison and overall operational direc. 
tion. This type of role is visualized for the Smithsonian In- 
stitution. It has been recognized, however, that the services 
of systems experts might be required to insure the design, de- 
velopment, and installation of the most effective and efficient 
system of operation. We consider it essential that the system 
be capable of automation and also ensure optimum retriev- 
ability of all stored units of information in order to serve the 
varying needs of participating organizations. Their diverse 
missions require different types of services and end-products, 
which as much as possible should be available from the nor- 
mal operations of the clearinghouse. The routine day-to-day 
operations of the installed system could presumably be per- 
formed by either a qualified private contractor or by a project 
staff of Smithsonian employees. Decision regarding the best 
method of operation has been deferred until ‘operational re- 
quirements and an appropriate system have been more fully 
Anvcleaal 

A further major consideration has been the Smithsonian 
Institution’s long experience in administering the Biosciences 
Information Exchange. It seems fairly ev ident that this ex- 
isting service and the proposed physical sciences service 
should eventually be merged at the earliest practicable date. 
The ultimate objective for exchange of information on cur- 
rent and proposed research should be a facility (eg., a 
Sciences Research Information Exchange) covering the life, 
physical and social sciences. From that viewpoint, admini- 
stration and direction by the Smithsonian Institution promise 
the greatest degree of compatibility between the biosciences 
and ~ physical sciences clearinghouses, as a prerequisite to 
eventual consolidation. 

By the end of January 1960 it was generally acknowledged 
that the proposed location uber} the Smithsonian Instituton 
represented the best current plan in all respects. If and 
when the services involve the Fanuniation of abstracts or 
documentary materials, the experience and facilities of the 





143 




















































































































































































144 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Office of Technical Services, Department of Commerce, 
should be considered. In planning the automation of the 
clearinghouse system, full consideration will be given to the 
capacity and special characteristics of data processing equip- 
ment which may be available within the Department of Com- 
merce, 
Aprrenpix E 
Numbers of abstracts or title listings published by the National 
Federation of Science Abstracting and Indexing Services 


[Figures from NFSAIS, Feb. 2, 1960) 





1960 
(estimated) 





Abstracting services: 


Aero/Space Reviews - - 


1958 | 1959 
' 
| 
| 











8, 500 9, 150 10, 000 
Applied Mechanics Reviews_...........-.-- 5, 285 | 6,411 7, 000 
CEE DOGEIED . - ene or gn nncatponds~ nwt 47, 547 | 2, 559 | 72, 500 
Chemical Abstracts..............-.-- 117, 656 125, 440 | 145, 000 
pT aa a eas ae 30, 000 33, 000 | 36, 000 
Mathematical Reviews. __- bi 9, 000 8,000 | 9,000 
Meteorological Abstracts and Bibliography. aie eared 7,000 | 8,085 | 11, 000 
Nuclear Science Abstracts. -...............-.-... 18, 000 23, 127 | 30, 000 
Psychological Arte 25 656 IRL iL 6, 100 | 11, 242 | 12, 500 
Review of Metal Literature_.__.-......-..--...--- 12,027 | 11,191 | 12, 000 
Technical Abstract Bulletin Pi si dace 82k 30, 000 | 36,700 | 25, 500 
U.S. Government Research Re; NR SL 9, 625 | 10, 239 | 11,000 
Technical Translations. _..___________- staan iacaiiaan eeaneaiaaen 10, » 974 13, 420 
I a Son eh cans es telltale ent had 300, 740 (os 356, 118 18 | 394, 920 
Indexing services: wal hin 
Bibliography of Agriculture___--- She oa ioe 99, 470 | 93, 107 | 96, 000 
Index Medicus. ----......---22..202-2.2--.- ----| 114,214 | 107,042 | 120, 000 
Total title listings. ...................-.-.-...-. 213, 684 | 200, 149 216 000 
Grand total... eee) 880, 287 r 610, 920 

| ' 

U.S. Department or AGrRicuLTURE LIBRARY 


The following information was submitted to the staff of the com- 
mittee by Mr. Foster E. Mohrhardt, Director of the Library, U.S. 


Department of Agriculture, to supplement the information compiled 
by the staff in the premise of this report: 


Since its establishment in 1862 the USDA Library has 
emphasized direct service to users. The Congress of the 
United States when it established the Department of Agri- 
culture was well aware of the importance of ready access 
to scientific information in the development of this country. 
Evidence is found in the organic act of the Department of 
Agriculture, which states that “the general designs and duties 
of which shall be to acquire and to diffuse among the people 
of the United States useful informution on subjects connected 
with agriculture in the most general and comprehensive sense 
of the word * * *” and that “it shall be the duty of the Com- 
missioner of Agriculture to acquire and preserve in his De- 
partment all information concerning agriculture which he 
can obtain by means of books and correspondence.” 

Collection of publications began immediately and in the 
past century the Library has acquired on a worldwide basis 
significant publications needed for agricultural research and 








n- 


ad 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


related uses. Emphasis has been given to publications in the 
field of chemistry which is basic to much of the research in 
this Department. The Chemistry Division was one of the 
first research units in the Department, and the work of the 
Department’s chemists has been outstanding through the 
years. 

In order to support the efforts of all the Department’s 
scientists, the Library has assembled one of the world’s most 
complete collections of printed materials (books, reports, 
journals, pamphlets, etc.) in such scientific fields as botany, 
agricultural bacteriology, entomology, forestry, soils, agri- 
cutural engineering, zoology, veterinary medicine, livestock, 
poultry, and plant pathology. 

Availability of the Department’s publications for world- 
wide exchange has enabled the library to set up continuing 
exchange arrangements with scientific societies, universities, 
libraries, and research institutions in practically every coun- 
try in the world. At present there are over 5,000 exchange 
points in other countries: and about 200,000 publications 
are obtained each year from these arrangements. More than 
500 different scientific journal titles are received from the 
U.S.S.R. under exchange agreements. Surveys of foreign 
countries have been made and formal repositories have been 
established in many countries. This provides an efficient 
system for disseminating U.S. scientific information abroad 
and provides a channel for the steady flow of foreign data to 
this country. 

In addition, the Department library provides a clearing- 
house of foreign exchange information for all State land- 
grant institutions. 

Current, up-to-date notification of the library’s holdings 
is given monthly through the Bibliography of Agriculture. 
Language and subject-matter experts screen the journals, 
books, and other printed materials received by the library 
and prepare listings of articles under appropriate subjects so 
that they can be readily located by research workers. An 
objective evaluation of this publication was made by Dean 
Shera and Prof. M. Egan of Western Reserve University, 
who stated: “Some of the best comprehensive subject bibli- 
ographies in this country have been produced by special 
libraries of the Federal Government, such as the Department 
of Agriculture Library and the library of the Surgeon 
General’s Office.” 

More than 20,000 separate journal and serial titles are re- 
ceived on a continuing basis by the ig meron library, and a 
published list of these titles is available to libraries and re- 
search workers. In addition to the regular Bibliography 
of Agriculture, issued monthly, the library also compiles and 

ublishes special bibliographies in fields of importance to 
artment research workers. Titles of some of these are: 

“Bibliography of Plant Pathology in the Tropics and in 
Latin America.” 


145 


































































































































































146 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


“Tropical Beef Cattle Industry in the Western Hemi- 
sphere.” 

“Rice Hulls and Rice Straw.” 

“Cacao; a Bibliography on the Plant and Its Culture and 
Pri imary Processing of the Bean.” 

“Forest Research Programs; a Selected Bibliography of 
U.S. Literature.” 
“Aircraft in Agriculture.” 

“Drainage of Agricultural Land.” 

Although the library gives priority to employees of the 
Department of Agriculture, it actually serves as a national 
dissemination center for scientific information in agriculture 
and related sciences. Direct service is provided to Govern- 
ment officials, to research workers in the land-grant institu- 
tions, to students, to agricultural organizations, and to re- 
search workers generally. Asa result of its unique holdings 
in chemistry and the biological sciences, particularly heavy 
use of the library is made by the chemical and pharmaceutical 
industries, 

Two notable characteristics of the library in its 100-year 
history have been the emphasis given to direct reader service 
and the pioneering efforts in fields of scientific information 
dissemination. Some of the pioneering efforts have been : 


1. Printing of library catalog cards 


Experimentation in the printing of these cards began in 
the USDA Library as early as 1900, and when the Library 
of Congress took over the nationwide responsibility in this 
field the Library continued to supply manuscript copies 
for this purpose to the Library of Congress. 


2. Photoduplication of articles 


Since 1911 this Library has made photographic copies of 
articles in its collection as a means of disseminating scientific 
information. Two special experimental projects in photo- 
copying have been carried out. The first was a cooperative 
program in 1934 with the American Documentation Insti- 
tute, using the USDA Library as a center for (1) extending 
the use of its resources to isolated research workers, and (2) 
to decrease interlibrary loans of bound volumes by supply- 
ing copies of articles. A second project carried on from 
1946 to 1956 was a joint arrangement with the American 
Chemical Society to provide copies of all articles which 
were listed in “Chemical Abstracts.” Both of these pro- 
grams aided in the carrying out of American scientific 
research. 


3. Development of the rapid selector 


With funds from the Office of Technical Services, De- 
partment of Commerce, Dr. Ralph Shaw developed experi- 
mentally a working model of the rapid selector. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


4. Use of punched cards in bibliographic work 


In calendar year 1949 the Library conducted an experi- 
ment in the use of electronic data-processing machines to 
produce the author and subject indexes to the “Bibli- 
ography of Agriculture.” 

“The Library is continuing its exploratory efforts to deter- 
mine the usefulness of automation in its work and the adapt- 
ability of available machines for its particular activities. 
It is recognized that there are new developments in high 
density storage as well as the possibilities of a read-out 
machine. The Library is now carefully considering various 
components which might provide aid to the Library in its 
preservation of material and in enabling the users to have 
more ready and more complete access to . publications. The 
Library hopes to begin feasibility studies to determine the 
usefulness of presently available devices for mechanization. 

The Library works closely with the other national libraries, 
the Library of Congress, and the National Library of 
Medicine, in determining fields of primary responsibility 
and in taking measures to avoid any unnecessary duplica- 
tion of activity. 

The Department of Agriculture Library is an active par- 
ticipant in many special committees and activities directly 


concerned with scientific information and its dissemination, 
such as: 


American Institute of Biological Sciences 
Biological Communications Study Group 
American ‘Institute of Biological Sciences 
Committee on Translations 
National Science Foundation 
Science Information Council 
National Science Foundation 
Federal Advisory Committee on Scientific Information 
National Federation of Science Indexing and Abstracting 
Services 
International Association of Agricultural Librarians and 
Documentalists 
Conference of Biological Editors 
International Council of Scientific Unions 
Abstracting Board 


The Department of Agriculture Library maintains its in- 
terest in promoting the dissemination of scientific informa- 
tion and in the development. of new methods and techniques. 
Because of its national responsibility, the Library is actively 
working with representatives of land-grant institutions (Dr. 
Richard Chapin, director, Michigan State University Li- 
brary) and with professional groups such as the American 
Veterinary Medical Association, to determine possible meth- 
ods for a nationwide coordinated approach in specialized 
fields in making scientific information more readily available. 








147 















































































































































































Part II 


SCIENCE INFORMATION AND RETRIEVAL SYSTEMS AND 
PROGRAMS OF NONGOVERNMENT GROUPS 


In addition to the Federal agency reports included in the precedin 
section of this report, representatives of non-Federal groups which 
were considered to be representative in this field were also contacted 
by the staff. 

Contacts were made with (a) certain industries which had been 
reported as having undertaken the study and development of systems 
for information retrieval to serve their own needs; (b) abstractors, 
institutes, or universities active in supplying information to users; 
(c) designers of systems for utilization of equipment required for 
data processing, or for mechanization of information indexing and re- 
irieval program; and (d@) designers and manufacturers of machines 
and automation equipment. 

In response to requests by the staff, representatives of these groups, 
in most instances, sent representatives to confer with members of the 
staff, and/or submitted the following summaries relative to their 
operations or prospectuses regarding the services which some of them 


were qualified to render in the field of mechanized information 
processing. 


AMERICAN InstIrute oF Brotogican Scrences (AIBS) 


The following report was received from Dr. Hiden T, Cox, execu- 
tive director of AIBS, setting forth a brief outline of the efforts of 
that institute to improve its program in the biological sciences, which 
also sets forth the assistance given by the NSF in support of its trans- 
lation and abstracting programs and services. 


I am grateful that you thought of us in connection with 
your efforts to compile data on communications among 
biologists and other scientists. 

The AIBS has had a long standing interest in this major 
problem area and we have taken a number of steps to try 
to improve what is a really deplorable situation. The mag- 
nitude of the communications problem is almost incredible 
and I am afraid that our efforts, which to us are herculean, 
have had only a depressingly limited impact. 

I am sure you are aware of our translations program by 
which we are trying to break down the language barrier be- 
tween English speaking and Russian speaking biologists. 
This program now 4 years old is supported in large part by 
the National Science Foundation. It consists of cover-to- 
cover translations of seven Soviet biological journals as well 
as translations of a number of reference works. In addition 
we have a project in the preparation of a supplement to the 


149 


150 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Bibliography of the Eastern Asian Fluoro. The updating of 
this monumental work in botany requires translation from 
Chinese, Russian, Japanese, Korean, and some of the south- 
east Asian languages. To this end we have people working 
in Washington and in Japan. This project also is being sup- 
ported by the National Science Foundation. 

We are likewise concerned with attacking the language bar- 
rier between Chinese speaking and English speaking 
biologists. We are proposing that the National Science 
Foundation support a program in translation of mainland 
Chinese biological materials. We do not propose to trans- 
late extensively on a cover-to-cover basis. We would prefer 
to have a more feasible operation in which highly sigmiticant 
articles would be translated and the run of the mill “publica- 
tions would not receive this attention and expense. We 
propose to produce review articles, summaries, abstracts, and 
the like. Since we feel that we can get adequate translation 
done for less money and because we feel that relationships 
between the United States biologists and Chinese biologists 
would be improved, we propose to establish Chinese transla- 
tion centers probably in Taipei and in Hong Kong. 

Obviously, one of the major difficulties encountered today 
in communications has to do with our traditional method of 
publication. Scientific journals, at least in biology, have 
small circulation. They are expensive and by and large have 
no subsidy. Many journals are at the point of pricing them- 
selves right out of the field and to combat this many have 
been forced to reduce their size. The backlog of unpublished 
research articles is growing longer by the day and it is not 
uncommon for biological research to wait from 1 to 3 years 
to see the light of day in published form. The AIBS with 
the Wildlife Disease Association has started an experimental 
journal which appears completely on microcards. This proj- 
ect is ee jointly by the Council on Library Resources 
and the National Science Foundation at least for the first 
3-year period. I think you might be interested in this radi- 
cally novel medium of publication and I am, therefore, en- 
closing a specimen. The cost of this type publication is a 
fraction of its cost by conventional methods. 

You are no doubt familiar with the chaotic condition of 
publication in the biological sciences throughout the world. 
There are now at least 30,000 journals which publish original 
biological research papers. A conservative estimate would 
put the total output at a million and a half articles a year. 
Obviously, major efforts must be undertaken to acquire and 
distribute these articles to scientists who have need of them 
in their research work. We have been studying this and re- 
lated problems through a committee on communications es- 
tablished by the institute about a year ago. We are now pro- 
posing to establish a permanent Biological Communications 
Study Group with primary support to be sought from the 





DOCUMENTATION OF SCIENTIFIC INFORMATION 151 


National Science Foundation. This study group, with ap- 
propriate staff, will have the following charge: 

1. Assess present and future needs of biologists for com- 
munications services. 

2. Analyze and evaluate the communication services pres- 
ently available to biologists. 

3. Make recommendations for a coordinated program of 
communications services in the area of the biological sciences. 

4. To establish program and operating units which will 
further communications among biologists and between biol- 
ogists and other scientists. 

When this project gets started it will take financial under- 
writing at a rate estimated to be not less than $3 million a 
year for the next 5 years. While this amount is consider- 
able, it is but a drop in the bucket when compared to the re- 
lease of time now being spent by productive research men in 
tedious and often fruitless searching of the literature. One 
of the most effective ways of immediately increasing the sci- 
entific research putea of this Nation’s biologists is to make 
available to them the maximum amount of time for produc- 
tive investigation. ‘This can be done at relatively little cost 
if we can improve the communications between the U.S. 
biologists and their colleagues throughout the world. 


Avco Corr. (Crostey Drviston) 


In response to the staff request, Brig. Gen. Monro MacCloskey, 
Assistant to the President, submitted the following description of a 
system currently under development by the Crosley Division of Avco 
Corp. for data indexing, searching, retrieval, and reproduction. The 
design objectives of this development are given in the following 
paragraphs. 

Verae 903 


Avco/Crosley’s Verac, a revolutionary method for record- 
ing, storing, indexing, remote viewing, and reproduction of 
up to 10 million legal-size documentary pages, puts a life- 
time of accumulated knowledge instantaneously at your 
fingertips. 

Using a high-speed, high-resolution reduction camera— 
an image a second—and a high-density storage—1 million 
pages per cubic foot—direct access file as integral parts, 
Verac combines the speed, fidelity of reproduction, and space 
savings mandatory in a modern filing and storage system. 

In less than 1 second Verac automatically selects one page 
from a million, displays it, and, if desired, produces a full- 
sized copy. 

Because Verac can be coupled to the output of any search 
system (printed indexes and/or special or general purpose 
computers, etc.), a high degree of versatility is assured. 
Other features of this rapid access, high-density storage sys- 
tem are: 

Separates search and retrieval functions for maximum 
reliability and flexibility ; 





152 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Radically reduces document storage area through mi- 
crophotographic techniques while still maintaining 
open-shelf accessibility ; 

Insures preservation of original document since docu- 
ments are never removed from file ; 

Services a number of users simultaneously ; 

Assigns individual attention and priority to each re- 
quest ; 

Stores documents without additions or alterations. 

These features, together with economy of operation and in- 
stallation, make Verac the logical solution to the problem of 


selection and reproduction of quantities of stored informa- 
tion. 


Elements of Verac 


A paramount consideration in the design of Verac is 
flexibility of component and element arrangement. This 
approach was taken to assure that Verac could perform any 
job—large or small, complex or simple. 

The step and repeat camera and the direct access file, 
which form the heart of the Verae system, are described 
below with some possible equipment configurations. 

Step and repeat camera.—The step and repeat camera 
forms a reduced image on the memory plane film with a high 
resolution objective lens. After exposure of each document, 
the memory plane is translated automatically so that the 
center of the next column is alined with the optical axis of 
the objective lens. Image recording rate will vary with oper- 
ator ability. The camera can, however, record at the rate 
of one image per second. 

Direct access file—The direct access file is a random access 
photo memory for storing microphotographiec reproductions 
of documents. Image fields are permanently recorded on a 
high resolution photographic storage medium. 

The memory system automatically manipulates the storage 
media in response to address command signals. An optical 
or electrical output signal is then obtained corresponding 
to the information in a designated microphotographic field. 

Microphotographic images are arranged in rows and 
columns on a group of matrix memory planes. The location 
of the image field is uniquely specified by three numbers: 
row number, column number and plane number. 

Digital feedback control techniques are used to register 
designated microphotographic image fields with the optical 
axes of the output systems. The direct access file has a basic 
capacity for 10 million legal-size pages. Expansion of this 
basic capacity is possible to suit individual requirements. 

Search.—Two methods of indexing can be used to initiate 
search procedures. 

Alphabetical indexes can be stored in the direct access file. 
An automatically produced directory would list the file 
address numbers. 









DOCUMENTATION OF SCIENTIFIC INFORMATION 153 






















































For exhaustive studies, a magnetic tape searching machine 
can be used. The search result is punched on paper tape and 
can then be— 

Printed out as lists of file address numbers; 

Fed to the direct-access file for viewing, page-size re- 
production, etc. ; 

Used to produce a special printed bibliography. 

Document Output.—Output can be obtained by— 

Electronic display: Resolution provided by this equip- 
ment insures sharply focused images permitting per- 
fect legibility of a full-sized typewritten page. 

High-resolution microfilm: A microfilm reproducer will 
reproduce selected images automatically on 35-milli- 
meter microfilm. 

High-resolution electrostatic printer: High-resolution 
electrostatic printer reproductions are possible at the 
rate of one page per second. 

Low-speed facsimile printer: The low-speed facsimile 
printer makes reproduction at remote locations on inex- 
pensive facsimile recording paper. One 8- by 14-inch 
page may be reproduced wth 1 to 2 minutes, 

High-speed facsimile printer: ‘The high-speed facsimile 
printer reproductions are produced at the rate of 1 
to 3 pages per second. 

Applications 

The flexibility and versatility of Verac can be proved 
wherever masses of statistical, historical, and administrative 
records are kept. Shown below area few of the many possible 
applications of Verac: 

Library documents. 

Correspondence. 

Drawings. 

Patents. 

ASTIA cards. 

Indexes. 

Abstracts. 

Books. 

Requisitions. 

Financial records. 

Personnel records. 

Engineering and technical data. 

Administrative memorandums. 

Journals and other periodic literature. 

Law enforcement records. 


Beit TeierHone LABORATORIES 







Mr. W. O. Baker, vice president—research of the Bell Telephone 
Laboratories, who had previously testified before this committee in 
1958 at its hearings relative to the status of scientific information 
processing, the need for improvement of Federal activities in this 


154 DOCUMENTATION OF SCIENTIFIC INFORMATION 


field, and the contributions which might be made by industry, wrote 
the staff, on January 14, 1960, as follows: 


We are glad to have indication of your continued inter- 
est in the matter of science information processing. I have 
already had useful relationships with your studies. 

We shall immediately begin preparation of an outline of 
the various science information retrieval facilities we em- 
ploy in this company. Our system is under the direction of 
Mr. W. K. Lowry, manager of technical infor mation libraries. 
Mr. Lowry, incidentally, has had broad experience also in 
the Federal science information agencies, including ASTTA. 

With respect to your memorandum on this staff study, I 
am delighted to infer that you are taking a progressive in- 
terest in the way the diverse Federal agencies themselves 
operate their science information processing. This is one of 
the urgent issues for improvement of the national position. 
While duri ing the past 2 years, in which your committee, the 
President’s Science Advisory Committee and other competent 
bodies have been interested in this pressing issue, great gains 
have been made in academic and industrial science and tech- 
nical information utilization, relatively little improvement 
has been made in the information techniques of certain pub- 
lic agencies themselves. This fact is so widely appreciated 
that I am indeed astonished at the implication on page 2 of 
your memorandum that there is opposition to a study pro- 
posed for the physical sciences by the Smithsonian Institu- 
tion. I wonder, in fact, whether any such opposition is based 
on responsible and valid understanding of what is intended. 
Certainly a study would itself make no alteration of the 
policy of “full utilization * * * of facilities of industries” 
and would in no way presume “Government * * * control”. 

My continuing deep concern with the position and utiliza- 
tion of the scientific literature for the national security and 
public welfare, as well as for industrial progress, impels me 
increasingly to realize how impractically many sources are 
treating this issue. For instance, many Government contrac- 
tors, as well as Federal agencies, imagine some miraculous 
ordering force will display treasures of knowledge automati- 
cally and instantaneously when the right arrangement of 
“literature buttons” is pressed. This is a serious fallacy, 
and we must invoke every possible influence to educate our 
coming generations of scientists and technologists in the dili- 
gence and acuity necessary for using the accumulated know!l- 
edge of man. Along with the certainty that no simple gadget 
will substitute for the intellect in this endeavor, we must 
place the equal certainty that proper research can improve 
and expedite the use of the literature. However, this research 
must be done at a level of sophistication in linguistics, auto- 
mata, and combinatorics which is not now available. It is 
the development of such resources for new knowledge that 
chiefly fills the National Science Foundation’s present efforts 
in this field. In the absence of better basic understanding, no 





DOCUMENTATION OF SCIENTIFIC INFORMATION 155 


amount of contract activity by eager industrial hardware 
merchants will advance the national capability in this field. 


After consultations with the staff, Mr. W. K. Lowry, manager, in- 
formation libraries of the Bell Telephone Laboratories, submitted to 
the staff, on February 9, 1960, the following information relative to 
the operations of the information retrieval systems in use at its labor- 


atories and comments on the effectiveness of other systems with which 
he was familiar: 


In your recent correspondence with Mr. W. O. Baker you 
requested a brief outline of the information retrieval systems 
in use at the laboratories and our comments on the effective- 
ness of other systems with which we are familiar. Although 
good progress has been made in the utilization of technical 
information, much of it has been due to improvements in 
information-handling functions other than those concerned 
with retrieval. This has been true in our own case, and we 
believe it is probably true in other organizations with which 
we are familiar. I hope the following comments will be 
helpful in your attempt to determine the progress being made 
in science information processing. 

There are generally available two methods of organizing 
information, ‘by random placement and by classified place- 
ment. An example of the first is found in the case of a bank 
maintaining a record of its customers’ balances on magnetic 
tape. When a customer opens his account, his name or ac- 
count number would be associated with his balance and this 
information is entered in random position on the tape along 
with that for other customers. The bank periodically re- 
views its total customer balance records for changes in the 
records. The search system is sequential and covers the com- 
plete record filed on tape. In the classified method for 
organizing information the entire record need not be scanned 
since it is arranged by predetermined groupings of informa- 
tion into classes and subclasses which may be used as a re- 
trieval approach. There are two types of systems generally 
used for classified storage and retrieval of information, the 
hierarchical method commonly used in libraries for many 
years and the coordinate method, which is more modern. 
Both have their advantages and disadvantages. Both have 
been used at the Bell Telephone Laboratories for many years. 

Coordinate methods are used to retrieve information from 
patent files and from files of security-classified documents. 
Both systems use manual methods for storage and retrieval 
and both have been tailored to serve our local requirements. 
They are supplemented by hierarchical systems published 
by the U.S. Patent Office and the Armed Services Technical 
Information Agency (ASTIA) as well as by internal an- 
nouncement bulletins to the laboratories staff. The systems 
used are relatively effective but expensive to maintain and 
not as good as we would desire. Machine systems have been 
investigated but to date none have been adopted. Major 
factors in delaying the use of machines have been the resolu- 
tion of fundamental problems of a research nature and what 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


appear to be doubtful economic advantages of current mech- 
anized systems for these files. Our study of information- 
processing systems is being continued and experiments are 
underway to obtain additional data. 

Hierarchical methods are used in our technical libraries 
and in many other departments where information is filed 
and retrieved. Machines of various types are employed as 
aids in storing and retrieving information, but none of the 
systems is completely automatic. Equipment used includes 
facsimile transceivers, flexowriters, electronic simulators, 
electronic accounting machines, computers, microfilm and 
microcard apparatus, high-speed reproduction equipment, 
photographic and optical devices, and audio and visual appa- 
ratus. Much use is made of paper tape, magnetic tape, 

unched cards, and more advanced techniques of storing in- 
Sesation required for laboratories projects and experiments. 

Our approach to the problems of information handling is 
that usually practiced in systems engineering. This in- 
volves definition of the problem areas, investigation of the 
state of the art for posditie solutions, application of research, 
establishment of system parameters, and design, develop- 
ment, and testing of prototypes. To date we have concen- 
trated our efforts on the first three aspects with respect to 
the overall problem and have used machines as aids in con- 
nection with specific information functions when it has been 
advantageous to do so. 

Communication involves a number of functions, storage 
and retrieval being just two of these. It is perhaps unfor- 
tunate that those concerned with problems of information 
storage and retrieval have frequently neglected giving due 
consideration to other aspects of the communication process. 
It may well be the underlying reason for failure of many of 
the storage and retrieval systems which have been designed. 
Communication begins in the mind of man and becomes a 
tangible form when first spoken or written. At this point 
the formal communication process starts operating. In its 
simplest form, information is relayed from its source directly 
to a recipient for fruitful application. In a large commu- 
nity of scientists, this direct contact is not possible and conse- 

uently an elaborate system of communication devices and 
Iters has been developed as an alternative to direct contact. 

Although these devices have been useful in certain respects, 
they have also presented new problems in communication. 
Attempts have been made to cope with the mass of informa- 
tion by preparing abstracts, indexes, classifications, bibli- 
ographies and other filtering devices. In the process, 
meaning has been lost or distorted. In many instances the 
use of such devices requires a specialized knowledge of their 
structure and limitations which is held only by those who 
have designed them. This has proved to be a nuisance to 
the scientist requiring to use them. As in the case of pri- 
mary publications, the number of secondary publications 
(information filters) has become so great that it,is not possi- 












DOCUMENTATION OF SCIENTIFIC INFORMATION 157 


ble for scientists to keep pace with even that information 
which has been filtered. 

The foregoing demonstrates some of the functions of com- 
munication which are commonly used today. There are 
others which must also be taken into consideration when 
general improvements are desired. And to gain such im- 
provements it is necessary to recognize the functional inter- 
play which takes place as well as the man-machine 
relationships bearing on this interplay. The attached table 
indicates some typical functions of a communication system 
for handling written information and demonstrates the rela- 
tional influence of one upon the other. Similar relation- 
ships for each function are shown for semantic, machine, 
and operational problem areas of communication. 

We believe that major improvements in information 
processing will be realized if information systems are de- 
signed with due regard for (1) the functional interplay of 
the various aspects of the system, (2) the requirement for 
competent research in linguistics, (3) the development of 
compatible equipment for information processing functions, 
(4) the need for competent operators of the system, and (5) 
adequate support and facilities. It is unlikely that improve- 
ment of hardware for storage and retrieval will in itself be 
very effective unless these requirements are met. 

* * * * * 





































Some functions, interactions, and problem areas of written communication 


{Figures refer to functions] 








Communication function | Functional} Semantic | Machine /|Operational 
interplay | problems | problems | problems 































1, Writing nnd oditibel i563 4) 2c ea SA 2,4 @?:. Rees. 2 
2. Printing and publishing__-_.- 1,4 1 1) 1, 3, 
3. General announcement -__-._._- 2, 4,7 BGT A ccne sce Ses 2, 4, 7 
4. Local information requirements 10, 12 DP fect deli tenes , 2 
0 , 


. Local acquisition 


. Local cataloging, indexing, and coding 


. Local announcement - ... .-........----.~-.22---o2e 


- Loch amegtkn i iol da hee Ei 





. Local retrieval systems____- Ciena Seg Moat eae 


. Local supply or reproduction 






1. Local S2RROUIOR... 50d cnctbaddacesobtcssnneleene=ts 2 





2. Loc 00.2 .2c., wsibte ee seco eskaee _.--------| 1, 2, 3, 4, | 1,2,3,4, | 9, 10, 11 | 2, 8, 5, 6, 7, 
|’ 5, 6,7,8, 6, 7,9 8, 9, 10, 11 
| 9,10, 11. | 





1 A problem of the function itself. 






Automatic equipment for information handling 

Mr. Lowry also supplied to the committee the following report he 
made at the working conference on “Automatic Documentation in 
Action,” held at Frankfurt on June 9-12, 1959, since it provides fur- 
ther details on the use of automatic equipment for information han- 
dling at the Bell Telephone Laboratories, which are pertinent to this 
report. 


54122—60——-11 


158 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


At the International Conference on Scientific Informa- 
tion held last November in Washington, it was apparent 
that during the past 10 years there had been numerous at- 
tempts to use machines for coping with documentation prob- 
lems but that many of these had failed because fundamental 
problems remained unsolved. It was also pointed out that 
many machine proponents had made claims for systems of 
information handling which could not be substantiated from 
an economic point of view or, in fact, from the standpoint of 
technical feasibility. These are valid criticisms which can 
be verified by the lack of success achieved by many machine 
systems, and it was well to emphasize the need for caution 
in using machines for handling information. The confer- 
ence made a notable contribution as a sobering influence on 
unrestrained enthusiasts. 

I believe, however, that the Washington meeting may have 
had an unfortunate effect in that it discouraged many who 
were there on the practical advantages which machines do 
offer for information activities. Certainly we must not ex- 
pect that we can assign to machines intellectual problems 
which the human mind has been unable to solve. Nor can we 
hope to change overnight the habits of scientists and engineers 
in their intellectual quest for information by simply designing 
a sophisticated maze of circuitry and equipment for informa- 
tion retrieval. And even if such machines can be designed, 
there are many who are concerned with the effect they may 
have on scientific procedure as we have known it during the 
past century. This procedure has resulted in significant 
progress to date, and new techniques for gaining knowl- 
edge may adversely affect our proven scientific methods. 
This, however, is a question for those concerned with social 
philosophy. Our concern is with the use of automatic equip- 
ment in respect to documentation. More precisely, we have 
met to explore the practical applications for such equipment 
in information-handling functions. 

What, then, are the possibilities for using automatic equip- 
ment in the business of handling information? We have 
noted that machines should not be expected to solve the intel- 
lectual problems of documentation, although this does not 
imply that they will not influence the methods for finding 
solutions. Certainly they are being employed for this pur- 
pose in the hope that we may reduce the difficulties of trans- 
lation, and in other intellectual problem areas of documenta- 
tion. But their greatest value at the present time is not as 
an aid in solving fundamental problems. Of far greater 
import are the advantages apparent in using machines for 
clerical operations associated with an information service. 
This has been recognized by R. A. Fairthorne, Ralph Shaw, 
and others who were quick to perceive the utility of machines 
for this purpose in Touheapeiasion work. Both Shaw and 
Fairthorne are well aware of the intellectual problems to be 


solved and their suggestions have great practical value for 
documentalists. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


There are, of course, many competent individuals in docu- 
mentation work who prefer to concentrate on the professional 
aspects of a problem and who feel that problems of a clerical 
nature are unimportant or Era But for those who 
are charged with the responsibility and general administra- 
tion of technical information activities, it would be fool- 
hardy to overlook the importance of both. Our concern, and 
I assume this holds for most of us at this meeting, is not 
only with documentation per se but with the contribution 
which documentation services make to the intellectual prog- 
ress of science. The fact that much of our work is of a 
clerical nature does not mean that our product is profession- 
ally unimportant or that it can be accomplished without in- 
tellectual effort. But all to frequently there is a tendency 
to minimize the importance of routine operations and to over- 
look methods of reducing or improving them. When this 
happens we reduce the potential value of an information 
service by allotting a high percentage of our staff and other 
resources to clerical functions. 

At the Bell Telephone Laboratories, an effort of consider- 
able size was necessary to improve a simple clerical opera- 
tion. To do this required many millions of dollars and the 
combined efforts of some of the best scientific and engineer- 
ing talent that could be brought together. It took many 
years to solve the problem and it was done by the use of auto- 
matic equipment. In this instance, the facts were clear that 
if the Bell System was going to be able to provide adequate 
telephone service, something had to be done to improve the 
ability to connect two parties who wished to talk. The old 
method of working through a manually operated switchboard 
required the use of thousands of telephone operators and a 
statistical study indicated that in coming years there would 
be a shortage of operators to meet the anticipated require- 
ments for telephone service. The clerical function involved 
was simply plugging a wire into a switchboard to connect 
two telephones. The attention given to this routine proce- 
dure resulted in the switching thatiey upon which much of 
our present-day computer art is based. It has also resulted 
in some 65 million elephaigie being connected—in large part 
automatically—in the United States today. 

The purpose in citing this example is to emphasize that, 
far from being uninteresting, the application of machine 
techniques to clerical operations may stimulate the highest 
type of intellectual challenge and result in significant bene- 
fits. We have this opportunity in information work. 

Although our work at Bell Laboratories is concerned with 
all aspects of comunication, I would like to tell you about 
certain machine applications which bear specifically on docu- 
mentation techniques and problems of general interest to this 
conference. One of our problems, and I’m sure one of yours, 
is concerned with improving our knowledge of information 
requirements. This is a continuing problem which requires 
continuing effort in view of changing requirements and con- 


159 





160 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


ditions. To more clearly establish the information needs at 
the laboratories, we recently sent out 4,700 questionnaires to 
our technical and management staff. This group represents 
about 43 percent of our 11,000 laboratories employees and in- 
cludes all scientific and technical personnel. We had the 
good fortune to have available an expert on the questionnaire 
technique who also was familiar =a computers, and largely 
because of his advice 90 percent of the respondents returned 
their questionnaires. Currently, we are in the process of 
preparing punched cards for the data supplied and we antici- 
pate using IBM equipment for correlation and analysis of 
the results. In due time we plan to publish a report on our 
experience in using machines for this purpose and copies will 
be available to those who wish them. 

Another problem which has been of increasing concern to 
us has been the lack of a satisfactory subject index to the 
internal technical reports produced by the laboratories staff. 
There are presently over 15,000 of these on hand and the rate 
of production is now nearly 2,000 annually. After careful 
study of the titles of these reports, we have decided that the 
words in the titles have sufficient. technical information con- 
tent to provide useful indexing entries. We have therefore 
decided to prepare a permutation index for all reports issued 
during 1958 to test the utility of this type of indexing for 
our purposes. In a permutation index each word used as an 
index entry is shown in context with other words of the title. 
As such, the pertinence of the report indexed is more ap- 
parent to the user than when an index entry stands alone. 
This type of index may be prepared automatically from key- 
punched natural text. 

We believe that by using the 120-character printout of 
the IBM 704 computer over 90 percent of our titles will be 
printed in full with each entry. We also expect to develo 
a computer program which will utilize the full page widt 
for printout. Since the requirements for programing are 
somewhat different from those ordinarily encountered, and 
in order to become better informed as to the possibilities 
which computers offer in library work, a senior member of 
the library staff is currently being trained to be an expert 
programer. We expect this training and experience will re- 
veal further profitable areas for computer applications in 
library and information operations. In respect to our proj- 
ect for indexing internal reports, we estimate that for some 
1,800 reports published in 1958 there will be about 13,000 
index entries in the permutation index requiring 220 pages. 
Supplementary indexes for report numbers, research project 
numbers, and authors will require an additional 80 pages. 
Our plan is to have this done automatically and to have copies 
distributed to members of the technical staff for personal 
use at their desks. We believe in this way a research worker 
will be able to find information if he knows just one of the 
following: (1) any single word in the title, (2) the author’s 
name, (3) the general subject area of work in a specific tech- 
nical department, (4) the area of work covered by any 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


technical project, or (5) the report number. Even if he 
knows none of these, he will usually find it useful to consult 
the index to determine what exists in his area of interest. 

In another area of library work, we have found that ma- 
chines have been helpful im information handling. The 
clerical operations associated with the preparation and main- 
tenance of lists and catalogs for certain types of library 
holdings were consuming too much staff time. Further, most 
of the work involved was repetitive in nature and very little 
of it was professional. The volume of work was so — 
however, that we found it necessary to assign professio 
librarians to clerical tasks and even then we weren’t able to 
get the job done in reasonable time. At this point we made 
a systematic study into the whole process for keeping such 
records up to date and found that a Flexowriter automatic 
typewriter offered the solution. We are now using this 
equipment for the preparation of 10 lists which must be re- 
vised in part to reflect changes in holdings. The aggregate 
number of pages involved in these lists is 700. When a list 
is revised, the changed information constitutes less than 10 
percent of the total. The 90 percent which does not require 
change is prepared automatically from the Flexowriter tape. 
One of the byproducts of using this equipment was the com- 
plete elimination of much of the information contained in 
the former lists as well as many of the routines involved. 

In libraries there are many requirements to keep records. 
One such record, found in most hbraries, is that maintained 
for receipt and claiming of current issues of journals. At 
the Bell Laboratories we have a central serials record con- 
trol for well over 3,000 journal subscriptions. This record 
is required also for handling the binding of journals when 
volumes are complete. After being checked in at the central 
record, new issues are then forwarded to one of the several 
libraries for use. At each library a similar record has been 
maintained for those journals it receives, so that inquiries 
by patrons may be answered and to assist in collating issues 
for binding. ‘To maintain these duplicate records at three of 
our largest libraries requires the equivalent of one person 
working full time. It is our desire to use that staff time for 
more productive effort and we believe machines will make 
this possible without adversely affecting service. We also 
know that unless we use machines we will need more per- 
sonnel to handle a growing workload in this area of our work. 

Basically, we plan to substitute the technique of rapid 
transmission of data for that of storage. In addition, we 
are attacking the principle that in order to give service on 
a small portion of an information file, it is necessary to main- 
tain a complete file in several locations. Using this ap- 
proach we will eliminate the records maintained at three of 
our largest libraries and install TelAutograph transceivers 
at each location and at the central serials record control 
point. This will provide instantaneous communication over 
a four-station network at a cost which is less than that now 
required to maintain the duplicate records. By assigning 


161 





162 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


a serial code number to each journal title and using mne- 
monic codes for frequently asked questions, we expect to 
facilitate data transmission between libraries and the cen- 
tral record control. When a library needs to get informa- 
tion from the central record, the coded inquiry is handwritten 
on the TelAutograph paper and received immediately at the 
central record. The record is consulted by a clerk and a 
reply is sent back to the library in the same fashion. The 
real advantage is that we gain much needed staff time for 
other functions and avoid the need for additional personnel 
for this function as our journal subscriptions increase in 
the future. 

In addition to eliminating the posting of some 40,000 
entries of journal receipts each year, we will be able to use 
the TelAutograph for other communication purposes not 
associated with journal records. Several of these are im- 
mediately apparent and others appear quite promising. In 
general, such equipment is most useful if quick communica- 
tion of brief information is called for frequently or when 
a written record is preferred to oral communication. It also 
contributes to work efficiency by fixing responsibility for 
action, dispensing with idle conversation, and reducing errors 
and misunderstandings encountered in oral communication. 

At this point I would like to describe an experiment we 
conducted which was not successful. This in no way implies 
that the equipment is not useful for other purposes but from 
our experience we found it unsatisfactory for library opera- 
tions, The equipment used was standard drum-type facsimile 
transceivers which were installed in libraries within a 25- 
mile radius of each other. Transmission was through leased 
telephone circuits which were also used for telephone contact 
between locations. The purpose of the experiment was to 
determine the need for and practical applications of such 
equipment in transmitting information from one point to 
another. After a year of trials the exploration we concluded 
that its continued use could not be justified for the following 
reasons : 

1. It could not transmit information contained in 
bound publications. 
2. Information on flat sheets required a scanning time 
of from 6 to 8 minutes per sheet. 
3. We did not have sufficient need to transmit flat copy 
to justify its cost. 
We also advertised the availability of the service to other 
departments of the laboratories but this did not increase the 
traffic noticeably. Our further conclusion was that flatbed 
facsimile equipment, if reasonably priced, would probably 
be very well adapted to library operations and could serve in 
lieu of interlibrary loan in many instances where urgency is 
involved. To date, we know of no flatbed equipment which 
meets our requirements. 

Although the facsimile transceivers used in our library 

experiment were found to be impractical, it should be pointed 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


out that essentially the same equipment is being used success- 
fully in another information-handling operation. Bell 
Laboratories is dispersed in 17 locations throughout the 
United States and, as might be expected, laboratories in 2 
or more locations may work on a common research or de- 
velopment task. A case in point concerns two laboratories 
450 miles apart which have need for current data on engi- 
neering specifications and drawings. Facsimile transceivers 
transmit the necessary information promptly and satisfac- 
torily at reasonable cost. Tied in with this equipment are 
IBM card data transceivers which provide automatic repro- 
duction of punched card data between the two locations. 
These machines are in regular use and prevent delays in work 
progress, 

Transmission of information by automatic equipment is 
one area of documentation where machines can be useful. 
There is still need for equipment which can readily and satis- 
factorily transmit the contents of a book at reasonable cost. 
Perhaps we shall see the day when this is possible, but it does 
not seem near at hand. In the meantime we shall probably 
have to be content with present methods of building vast 
collections of information or borrowing what we need from 
others. This, of course, contributes to the problem of space 
for the files which must be accommodated. The storage of 
information is an acute problem, not only from the point of 
view of adequate subject analysis and quick retrievability, but 
also with respect to physical handling and facilities. Much 
work is in progress on ways to improve indexing, classifica- 
tion, and retrieval of the subject content of printed informa- 
tion. In certain cases, however, there are types of documents 
which do not require the sophisticated subject controls 
usually associated with books and journals. Engineering 
drawings are in this category. 

At the Bell Laboratories, the number of drawings on file 
is about 2 million. To store and provide service on these 
drawings is costly and time consuming. The situation is 
aggravated by the need to constantly revise drawings and by 
new additions to the file. This situation presents an operat- 
ing problem of considerable dimensions and the laboratories 
recognized that this was a fact to be dealt with rather than 
just a problem. Consequently, in 1956, after a thorough in- 
vestigation of factors governing the preparation, updating, 
storage, and handling of engineering drawings, a trial was 
made to determine the feasibility of applying machine tech- 
niques to the operation. The trial covered 15,000 drawings 
and the results were so encouraging that an expanded pilot 
ae involving 200,000 drawings is now in progress. 

his, in effect, is one of our largest current efforts to mech- 
anize clerical operations associated with information han- 
dling. 

i wee obvious that microfilm would reduce storage space 
and offer advantages in maintaining files of drawings. It 
was also recognized that certain problems must be met if 


163 








164 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


microfilm was to be substituted for full-size copies of prints. 
These were considered in detail before the decision was made 
to miniaturize the files. The pilot program includes about 
100,000 active drawings and approximately the same number 
which are inactive but must be kept as legal records. For 
those which are inactive, a master roll of 35-millimeter film 
is kept in storage and a duplicate is cut into frames and put 
in individual transparent envelopes along with the existing 
3-by-5-inch cards already used for record purposes. They 
are readily retrieved for the occasional need that arises for 
them and may be viewed in a microfilm reader or made into 
prints. 

Microcopies of active drawings are treated somewhat 
differently, these being mounted in aperture cards suitable 
for sorting on electric accounting machines. These cards are 
filed as produced and are immediately available for use by 
engineers. When a drawing is needed, it takes about 1 
minute for a clerk to make a duplicate film card from the 
master card and this can be viewed in one of numerous 
readers placed in strategic laboratory locations. If, on the 
other hand, an enlarged print is needed this can be provided 
as well. When either form is provided, it is for the engi- 
neer’s retention or other disposition. Copies are not re- 
returned to the files service since additional requests will be 
filled on the same basis. 

A large-scale file and reproduction service for engineer- 
ing drawings is an expensive operation. It is unlikely that 
mechanization of procedures can be obtained without sizable 
investment in the machines to be used. This equipment in- 
vestment must, of course, be measured against the cost. of 
doing the job without machines and in view of possible 
service improvements offered by a machine system. We 
have employed the following equipment in our pilot program : 

1, A continuous flow-type microfilm camera ; 
2. An office-type film processor ; 
3. Test and inspection instruments (microscope, den- 
sitometer, light box, splicer) ; 
4. Electric accounting machines (card punch, verifier, 
interpreter, reproducer) ; 
. Film mounter (for aperture cards) ; 
. Motorized tub files for storage ; 
. Film duplicating machines; 
. Microfilm readers (18- by 24-inch screen) ; 
. Enlarger-printer. 

Our experience to date leads us to believe that mechaniza- 
tion will not only provide a tremendous reduction in space 
requirements and offer much better service on engineering 
drawings but may also result in better drafting standards. 
Here again we received a useful byproduct as a result of 
undertaking mechanization. In the process of gearing up, we 
were able to put our files in much better order for effective 
use. 


SO CO =F Sd Or 


ee ae 


DOCUMENTATION OF SCIENTIFIC INFORMATION 165 


The foregoing demonstrates some of the applications of 
automatic equipment for information handling which appear 
worthwhile. We have barely scratched the surface in ap- 
plying machine techniques for this purpose and in our con- 
tinuing program of mechanization others will be tried and 
used when it is possible to reduce the clerical drudgery as- 
sociated with information services. Although the examples 
cited may appear to be isolated attacks on specific problems 
of documentation, they are, in fact, part of a systematic plan 
to improve the general problem areas of documentation rep- 
resented by information requirements, control, storage, re- 
trieval, reproduction, transmission, and service. These seven 
areas constitute the framework within which we have prob- 
lems and must seek improvements. All of them offer possi- 
bilities for the use of machines. As documentalists we cannot 
ignore the value they have for improving the work for which 
we are responsible. As administrators we have a higher re- 
sponsibility to those who work with us and for us. Mr. 
Norbert Wiener has brought this forcefully to our attention 
in his excellent. book on “The Human Use of Human Beings” 
wherein he says, “* * * any use of a human being in which 
less is demanded of him and less is attributed to him than his 
full status is a degradation and waste. It is a degradation to 
a human being to chain him to an oar and use him as a source 
of power; but it is an almost equal degradation to assign him 
a purely repetitive task * * *, which demands less than 
a millionth of his brain capacity.” 

Machines can be useful in this respect as well. 


CuemicaL AgpsTrracts SERVICE 


Mr. D. B. Baker, director of Chemical Abstracts Service, an infor- 
mation service for chemists and the chemical industry which pro- 
duces “Chemical Abstracts” and its indexes, responded on February 
5, 1960, to the request of the staff for a report on the operations of 
the science information retrieval system being developed by that 
Service, as follows: 


Your letter of January 11 regarding activities of the staff 
of the Senate Committee on Government Operations kindly 
offers us an opportunity to outline briefly the science in- 
formation retrieval system being developed by our services. 
We appreciate this much. 

* * * * * 


In the search for better methods of handling scientific in- 
formation, especially by machines, the tendency may be to 
neglect abstracting and indexing services during crucial 
years ahead. It should be fully realized that abstracts and 
indexes are likely to continue for a long time to provide the 
main, everyday, inexpensive, useful, retrieval system avail- 
able-to-all. For this reason I am attaching a copy of a 
publication, “CA Today,” on the production of “Chemical 





166 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Abstracts” which you may find useful in the Committee’s 
studies. 

In the final, you have our best wishes for continued success- 
ful studies in the science information programs and our 
promise for full cooperation to encourage and help in the 
broadest and most liberal manner the advancement of science 
in all its branches. 


CA Research and Retrieval Program 


The seven main projects which are being actively investi- 
gated by the CA Research Division are listed below, together 
with a brief account of the status of each. 

Project 1 (NSF supported).—The storage and retrieval of 
chemical data by mechanical methods: The proposals for 
this are, of course, shown in the submissions made to NSF 
prior to the granting of their support. 

A pilot run of 14,000 cards (comprising the whole of the 
organic fluorine compounds) has been prepared, represent- 
ing about 1 percent of the whole series of compounds which 
will ultimately be handled, many of which are now in pre- 
liminary card form. Each card contains references to the 
whole of the literature of each structure. These are being 
processed by variants of the following procedures to ascer- 
tain the simplest method of retrieval and storage. 

An IBM program by which the whole of the data can be 

unched onto cards has been agreed between the CA 

esearch Group and IBM. The punched cards are to be 
processed automatically to tapes and selection made on IBM 
1401. The output of selected data can be obtained by the 
IBM 1401 printer or the Document Writer or by means of 
a series of tab numbers can be furnished as Minicards or 
hard copy. Part of the plant for this has been delivered, 
part is on order. The plant for photographic files (aperture 
cards) has been specified, ordered, and part has been 
delivered. 

Thus, a whole mass of data on the structures of chemical 
compounds, either wholly or fractionally, can be correlated 
and questions answered. 

The final setup for this is envisaged as having the following 
subsections : 


1.1 Data gathering section (McBee files) 
1.2 Structure files (IBM files) 
1.3 Direct access structure files (IBM and aperture card 
files) 
1.4 Tape converters: 
401 comparators | 
1401 Printer and Document Writer ( 1401 section 
5 Concept files and selectors for 1-6 
6 Photographic files of abstracts (2 million documents) 
giving replica aperture cards or hard copies. 


The extension of 1-6 to original data, Beilstein and Gmelin 
summaries is being actively investigated. The production 
of a series of reference volumes starting with a “Lexicon of 


1. 
1. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 167 


Organic Fluorine Compound” is envisaged. Specialized in- 
formation services are contemplated. 

Project 2 (NSF supported).—Concept codes and chemical 
semantics: A start has been made on the formation of a range 
of concept codes for the storage of data on properties of 
chemical substances other than their structural attributes. 
The concentration, so far, has been in the field of physiologi- 
cal activity. A draft code for this section is in preparation 
and a work study is being made in relation to the CBCC code. 

These codes are essential also to project 6, which will, in the 
pilot run, use the same concept code as for this project. 

Work on the general organization of chemical semantics 
has also been planned along the lines indicated in the NSF 
project application. 

Project 3 (NSF supported) —A Keyword Title Index to 
CA has been planned. This is a permuted keyword index to 
be based on the IBM “KWIC” system. (See also NSF 
application). The status is as follows: The range of journals 
covered has been agreed. The first two issues of this pub- 
lication (Chemical Titles) were processed on the IBM 704 
and have been published. (Copies are in the files of the 
committee.) Two issues are planned for April and June 
after which the project will go into production if feasible. 


Ea ie es 
ited 
d 


By the procedure (which bypasses abstracting and index- : 
ing) an index is produced from which chemists can obtain 
a quick insight, into what is going on; it is a “current aware- 
ness” device. 

Project 4.—Work studies on the regular indexes and for- 
mat of CA: The Research Division has under review the for- 
mat, production methods, work-study potentialities and other 
factors of the regular publications. R program of accelera- 
tion has been initiated which has already resulted in a short- 
ening of the publication time and it is confidently believed 
that the full implementation of this program will make CA 
indexes current by 1962. 

Project 5.—New index production methods: During the 
last 6 months an experimental program has been carried out 
on the production of indexes by a radically new method, 
namely, the use of Varitype and line-camera plant. It 
has been demonstrated that an acceptable printing can be 
obtained and sample pages have been prepared. The neces- 
sary modifications in the production of index cards is now 
being studied. 

It may be added that the perfection of this method of in- 
dexing would greatly diminish the time, cost, and trouble in 
the production of cumulative index which would then be 





168 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


almost entirely a clerical procedure based on existing cards. 

Project 6. RELPAS (Restricted Express Lists/Physiolog- 
ical Activity Section): A detailed work study is being 
made of a project for obtaining express data from primary 
publications in advance of abstracts. This procedure is re- 
stricted to chosen fields of importance—such, for example, as 
the pharmaceutical field which has been selected for the 
work study. This special service will supply lists of com- 
pounds (arranged by molform and linear notation) to- 
gether with indications of the physiological trials made on 
them. ‘The same physiological concept code will be used as 
in project 2. It is hoped, ultimately, to obtain the direct co- 
operation of laboratories and institutions where physiologi- 
cal and chemotherapeutic trials are made, to furnish their 
results direct and so anticipate primary publication. The 
extension of this to other fields will be merely a matter of re- 
sources in personnel, space, and money. 

Project 7.—Plans have been made and eros points 
agreed for a joint production with SOCMA (Synthetic Or- 
ganic Chemical Manufacturers Association) of a “Lexicon of 
Nonsystematic Names Used in Organic Chemistry.” This 
will cover some 30,000 names (about half of which are al- 
ready collected and have been punched onto IBM cards). 
The whole of the data of this compendium will be kept on 
punched cards so that the preparation of subsequent editions 
will be a matter of machine routine. There has been a 
great. need of such a work and its production will fill a gap 
in the documentation of chemistry. 

Conclusion.—In the above short account technical details 
have been avoided; in most cases there are confidential in- 
ternal reports on these projects which deal with the various 
methods it is proposed to use. 


On March 8, 1960, Mr. Baker further advised the staff that— 


Based on my personal observations of the science infor- 
mation services in Russia last October and November, I 
wish to assure you that this country is not lagging in its re- 
search and development of documentation and retrieval sys- 
tems. One only has to look at the amount of work going on 
in our country as described in the “Current Research and 
Development in Scientific Documentation” as issued by the 
National Science Foundation in Publication No. 5, dated Oc- 
tober 1959, to readily see that the American effort is many 
times that of the Russians in this field now. We want to 
keep it that way. Your committee has helped and can con- 
tinue to help immeasurably along these lines. 

I take this opportunity to enclose a copy of my report of 
the “Soviet Science Information Services” and one on “Let’s 
Set the Record Straight” which appeared in the January 11 
issue of Chemical and Engineering News, which may be of 
interest to you. If I can be of further assistance at any time, 
please do not hesitate to call on me. 






















169 


Mr. Baker further advised the committee, on April 29, 1960, that— 


I am taking the liberty of sending sample copies of the 
first two issues of Chemical Titles, which publication is the 
result of the experimental work on permuted types of in- 
dexes as a valuable tool for scientists keeping informed of 
current developments in their fields. This development has 
been termed the “miracle of the decade” and has been said 
to be “the greatest thing to happen in chemistry since the 
invention of the test tube.” Many hundred interested users 
of this tool have now reported their reaction to it since the 
1st of April and we are hopeful to go into production on it 
this summer, although some experimental work needs to be 
done on it. This work is explained briefly under project 3 
of the projects supported by the NSF (p. 111). 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


DocuMENTATION INCORPORATED 





































In response to the committee’s request, Mr. Mortimer Taube, presi- 
dent of Documentation Incorporated, submitted the following report 
on the science information retrieval systems developed by his organi- 
zation and in current operation: 


This is in reply to your letter of January 11, 1960, request- 
ing information concerning the science information retrieval 
systems which have been developed by our organization and 
which we are currently operating. 

I am forwarding herewith: 

(1) A brief description of the Cancer Chemotherapy 
National Service Center data processing project devel- 
oped and operated for the National Institutes of Health; 

(2) A brief description of Experimental Contract 
Highlight Operation (project ECHO) developed and 
operated for the Air Force Office of Scientific Research ; 

(3) A brochure describing the Man-Machine Infor- 
mation Center developed and operated for the Office of 
Naval Research. 

In addition, I am forwarding herewith a technical paper 
on project ECHO which describes not merely the operation 
of this system but gives an account of our general research 
activities in the development of this system and the Cancer 
Chemotherapy National Service Center data processing 
project. 

You indicate in your letter that arrangements can be made 
for a member of your staff to visit us. This would please us 
very much, since we think an actual demonstration of the 
operation of these several systems is a much more eflective 
illustration of the kind of work we are doing than a pile of 
dead paper. If you or any member of your staff will call us, 
we will make arrangements for a visit at any time which is 
convenient to your staff. 

As you know, there are a great many claims and counter- 
claims in this field, but we ourselves have never abused the 
forum presented by your committee to claim unique and won- 
derful accomplishments, We have always felt that the in- 


170 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


vestigations your committee is making in this field are of ut- 
most importance and we have several times indicated our 
readiness to cooperate in any way we can to advance your 
investigations. Nevertheless, I must at this time point out 
that the very fact of congressional interest in this field has 
created a serious problem. It has brought into this field a new 
group of people without background « or experience. These 
people have indicated to your committee their eager readiness 
to solve the Government information problem if only some- 
body will give them enough money to do so. The danger here 
is that enthusiasm for progress may lead to new and uncon- 
sidered action which overlooks the solid accomplishments in 
this area. The systems we have devised are certainly not the 
only possible systems. Our own realization that new systems 
are possible lies at the basis of our own continuing research 
program which, we hope, will enable us to make the systems 
we operate ever more serviceable to the needs of science and 
industry. We should like to point out that the material that 
we are submitting describes not “pie in the sky,” but actual 
operating sy stems which can be used as models for extended 
services which can do much to solve the Government’s prob- 
lem in this field. We also feel it necessary to point out that 
these problems will not be solved by Government alone but 
only by the largest degree of cooperation between Govern- 
ment, learned societies, and private industry. In the plans 
which we have proposed to the Navy for the extension of the 
Man-Machine Information Center, such cooperative activity 
is the heart of the plan. 

We can honestly say that there is not a single major operat- 
ing information system that we have not studied in planning 
our own efforts. Any system proposed by anyone should be 
looked upon with suspicion unless it is based on a similar 
study. We have in the past made constant representations 
to this effect to the National Science Foundation. We have 
not, in these representations, sought contracts or an increase 
in business, but have asked only that, the National Science of 
Foundation not overlook our work in planning its own ac- 
tivities. The one field in which we think there was an initial 
tendency to overlook this work is the field represented by our 
Project ECHO and its relationship to the plans of the Na- 
tional Science Foundation for developing a Physical Sciences 
Information Center. 

You mention in the staff study which accompanied your 
letter that your staff is evaluating a proposal now being con- 
sidered by the National Science Foundation to set up a Physi- 
cal Sciences Center within the Smithsonian Institution, 
modeled upon the present Bio-Sciences Information Center. 
This, we feel, would be a most serious error and we would 
appreciate an opportunity to present this matter in detail to 
your committee, on the assumption that your committee in- 
tends to study this question. We would like to do so in the 
presence of representatives of the National Science Founda- 
tion and the Smithsonian Institution. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


CANCER CHEMOTHERAPY NATIONAL SERVICE CENTER DATA 
PROCESSING PROJECT 


A. Documents in the system 


Subject: Chemical-biological test data on synthetic com- 
pounds and natural products (e.g., antibiotic filtrates, plant 
extracts, tissue extracts, etc.). 

Type: Summary reports on these materials screened for 
biological activity against tumor-impregnated mice and tis- 
sue cultures. 

B. Personnel involved 


Operating the system: There are eight keypunch-verifier 
operators, one computer operator, three EAM operators, six 
clerks, one biologist, two programers, and one senior systems 
engineer who operate, supervise, and control the system. 

Staff served: The staff of the Cancer Chemotherapy Na- 
tional Service Center, their consultants, and the research 
pare who supply the materials under test are directly served 
y the system. 


- 


C. Analysis of material in system 


Depth of indexing: From three to six index entries are 
assigned to each test plus a numerical computer address to 
which future entries will be chained. A standard index of 
codes is used. Fourteen code fields are punched on a stan- 
dard 80-column card. The codes are alpha-numeric. The 


size of the vocabulary is approximately 100 terms at the 
present. 


D. Operation 


Equipment used: A standard IBM EAM installation, 
keypunch-verifier units, as well as the 305 RAMAC random 
access computer are used. 

Steps in processing a document: Documents are received 
in groups from the different screeners daily. The groups are 
edited to insure uniformity of reporting and conformance to 
protocol. Each document is then keypunched and verified. 
Data coding is done by the keypunch-verifier operators di- 
rectly from the original document. The punched cards are 
sorted for proper sequence loading into the computer, where 
computations, data reduction, and determination of test 
status of each material is made. The assignment of “status” 
codes by the machine requires a program of some elegance 
because of the number of status possibilities a chemical may 
assume, depending upon the equating of current test results 
with all previous test results. When a search is required the 
item numbers are pulled from the master index. The index 
cards are fed into the computer, thereby producing the de- 
sired printing of summary data. 

Number and type of searches: Between 800 and 1,000 
searches are made each month. These range from searches 
made by our staff for file correction purposes to large searches 
requested by the Cancer Chemotherapy National Service 


171 





172 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Center staff. A program is now being written for the IBM 
9900 that will coordinate the system, increasing the variety of 
searches possible. Searches will then be available to deter- 
mine what items are showing activity on a certain tumor sys- 
tem and not another. 

Number of terms used per search: Currently from one to 
three terms are used in a search routine. 


EB’. Size of system and rate of growth 


Some 24,000 materials are “in process” at the present time 
with an in-and-out traffic of about 1,000 transactions involving 
100 materials per day. 


Established: The present system became operative March 
1, 1959. 


EXPERIMENTAL CONTRACT HIGHLIGHT OPERATION 
(PROJECT ECHO) 
A. Data in system 

Subject: Contracts let by the Air Force Office of Scien- 
tific Research in various fields of basic research. 

Type: An item or contract description consists of a 
“header” block listing institution, title, investigator, and 
AFOSR monitor, followed by a series of “transaction” lines 
each of which deals with a particular period for which the 
contract was written or renewed. Transaction lines include 
such details as Purchase Request No., Contract No., Type of 
Contract, Project, Task, RPO Area, DOD Subject Field, 
Starting and Expiration Dates, AFOSR Division, Annual 
Rate, Funds, Fiscal Year of Funds. Details are numeric, 
alphabetic, or a mixture of the two. In addition to this ad- 
ministrative and fiscal description of the contract, exhaustive 
subject indexing is performed on each item. 

B. Personnel involved 

Operating the system: Exclusive of keypunch and machine 
room support, two people have been retained, for the various 
phases of obtaining and analyzing AFOSR documents, re- 
ducing the data for input into the system, accomplishing the 
subject indexing, and answering requests. Part of the time 
of a programer has been utilized in order to make the most 
efficient use of the IBM 305 (RAMAC) as an indexer and oc- 
casional store of the descriptive data. 

Staff served: The primary beneficiaries of the system are 
the scientists and administrators of AFOSR seeking better 
contract control. Working through AFOSR, other groups, 
such as AFSWP, have made occasional use of the system to 
determine active contracts in specified subject areas. 

C. Analysis of material 

Depth of indexing: Twenty pieces of information can be 
included in the combination of headers and one transaction 
line. All of these appear in any printed tabulation of con- 
tracts but not all serve as access points. Type of Contract, 
Project, Task, RPO Area, DOD Subject Feld, and AFOSR 
Division are truly indexed. It is possible to run summations 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


on the various kinds of funds involved. Alphabetic or nu- 
meric sorts are possible on institution, investigator, monitor, 
type of institution, and fiscal year of funds. The subject in- 
dexing averages around 18 terms per contract but in individ- 
ual cases can go nearly twice this high. 

Control of terminology: Authority lists are kept for insti- 
tutions, investigators, monitors, and the subject vocabulary. 
The subject vocabulary is constructed on the basis of words 
used by AFOSR monitors in writing detailed justifications 
for the Department of Defense and in writing the careful, 
binding stipulations included in the actual legal text of the 
contract. ‘The various numbers and letters that are involved 
in a contract description are essentially the result of preeod- 
ing on the part of the Department of Defense or AFOSR. 
For instance, project No. 9750 involves propulsive energy 
sources; RPO Area 803A involves electronics; DOD Sub- 
ject Field K is meteorology; Type of Contract FP indicates 
fixed price; AFOSR Division YN is their Nuclear Physics 
Division. These abbreviations, both letters and numbers, 
are punched as they are, according to standard IBM pattern, 
in fixed positions on the cards, but without any further cod- 
ing. Later, for indexing purposes, all contracts designated 
“803A” or assigned the subject Heat Transfer, for instance, 
will have their item numbers posted on cards behind codes 
which will stand for 803A and Heat Transfer. However, 
these indexing codes appear only on the IBM 9900 
(COMAC) cards used for indexing. They do not appear on 
the cards containing the contract data. Codes added by us 
to the latter group of cards, for sorting purposes, are an item 
or contract description number (assigned in straight sequence 
as the new contract comes in) which unites all the cards deal- 
ing with a specific item; a numeric card code indicating what 
header information the card contains or whether it is a trans- 
action card and if so, which one; a numerical, single digit, 
type of institution code (i.e., nonprofit, industrial, foreign, 
etc.), and a four digit (two letters, two numbers) alphabetic 
sorting code (on the name of the institution). 

Relationships among index items: Indexing is maintained 
on cards designed for the COMAC. Logical relationships 
among subject terms or indexed descriptive data are reduced 
to Boolean algebra equations and manipulated on this ma- 
chine, ultimately providing a card listing item numbers of 
those contracts satisfying the given conditions. If, for exam- 
ple, all contracts were desired that treated of stability prob- 
lems with delta wings and that had been designated as RPO 
Area 806A, the procedure would be select from the index 
files, using your code dictionary, the groups of cards contain- 
ing the postings for Stability, Delta Wings, and RPO Area 
806A, insert them in the COMAC and run a logical intersec- 
tion on their contents. The results would be punched out on 
an answer card which could then be inserted in the RAMAC 
to obtain a print-out of the contracts fulfilling these con- 
ditions. 


54122-6012 


173 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


Size of vocabulary: The total number of subject terms at 
present is 2,516. The total number of indexed descriptive 
terms is at present around 100. Both are capable of erratic 
growth as the program is expanded. 


D. Operation 


Equipment: IBM 305 (RAMAC), IBM 9900 (COMAC) ; 
and, at various times, keypunches, storters, tabulators, alpha- 
betic interpreters. 

Steps in process: The information handled is acquired 
from various documents and is in various degrees of firmness. 
Descriptive information is transferred to a so-called input 
sheet which is diagramed and numbered for use of the key- 
punchers. There it is transferred to punched cards, three 
cards for the header information and one card for each trans- 
action. At that stage the cards are ready to undergo any 
sorting that might prove necessary or are ready to be fed 
into the RAMAC. The subject indexing is handled in a sepa- 
rate routine. Pertinent subject terms are written on tracing 
cards having the appropriate item number. In the punch 
room separate cards are made for each assignment of a sub- 
ject—the item number is posted to the term number. The 
resultant deck is consolidated into a COMAC deck through 
the use of the RAMAC. Each card can contain 12 postings. 
A term such as “Quantum Mechanics” might necessitate five 
or six cards to provide for all postings. Updating an exist- 
ing COMAC deck with a new deck is performed by running 
logical operations on the two decks with the COMAC. In- 
dexing of descriptive data is achieved as a function of input 
into the RAMAC. The resultant deck is handled in the same 
way as the subject deck. 

Questions are of various kinds. Tabulations of contracts 
have been requested by foreign country, by monitor, by spe- 
cific institution, by amount of money, by termination date, 
by specific subjects or combinations of subjects. 


FE’. Size of system 

The system contains at present approximately 1,200 con- 
tracts, active and completed. New contracts are added at 
the rate of about 20 to 25 per month. However, new or cor- 
rected transaction lines number in the hundreds per month. 
The system will have been in experimental operation for 
nearly a year by March 1960. 
F. Authority files 


The authority files, maintained manually, are, of course, 
auxiliary to the main system. 


G. Publications or reports 


“An External Index to a Computer Store of Items and 
Transactions as Illustrated by Project ECHO,” (AFOSR 
TN 60-8), a report prepared for the Directorate of Mathe- 
matical Sciences, Air Force Office of Scientific Research, 
under Contract No. AF 49(638)-91, December 1959. 


DOCUMENTATION OF SCIENTIFIC INFORMATION 175 


E. I. pv Pont pe Nemours & Co. 


Mr, J. S. Sayer, engineering department, E. I. du Pont de Nemours 
& Co., advised the staff, on January 19, 1960, in response to the 
commiittee’s request for a report on the operations of its information 
retrieval program, as follows: 


We would be pleased to have Mr. Eugene Wall contact 
you at Washington so that he could discuss with you our 
experience and views on data and information processing. 
Mr. Wall is responsible for technical consultation in the field 
of documentation with our several operating organizations. 
He has initiated a number of working systems and par- 
ticipated actively in the development of technology. While, 
as a consultant, he is not in a position to define for you or 
your committee company policy in this area, he can and is 
free to discuss our experience, state of technology, and bene- 
fits, both preseut and future, from our documentation activity. 


Mr. Eugene Wall, engineering service division of the company, 
briefed the staff at length relative to programs in the science in- 
formation retrieval field which he had reviewed on behalf of his 
company, at which time he outlined various aspects of the studies 
he had been conducting with the objective of utilizing the best systems 
and equipment now available to the engineering service division. 
Under date of March 30, 1960, Mr. Wall submitted the following 
résumé of his comments on information storage and retrieval prob- 
lems, the results of his studies on behalf of the Du Pont Co., and 
his opinions as to those activities which should (or should not) be 
supported by the Federal Government. 


The discussion (which follows) first describes the back- 
ground and experience of the writer; this is for your benefit 
in evaluating the validity of the comments contained herein. 
Next, there is a discussion of what we consider to be funda- 
mental considerations; frequent reference is made to the 
attached paper, “A Practical System for Documenting Build- 
ing Research,” of which you have several copies already. 
Finally, some suggestions are submitted in regard to the 
writer’s opinions as to those activities which should (or 
should not) be supported by the Government. 


Background and experience of writer 


The problem with which we are concerned is that of the 
efficient and effective storage and retrieval of information. 
The writer has for the last 4 years been concerned primarily 
with developing solutions to this problem. Prior to that, he 
was one of many who were finding it increasingly difficult 
to utilize, in solving present engineering and scientific prob- 
lems, the knowledge developed in the ets 

The writer graduated from the University of Missouri 
in 1944 (B.S. in chemical engineering), worked as a process 
and product improvement engineer at the Baltimore titanium 
pigment plant of E. I. du Pont de Nemours & Co. for several 
months, was in the U.S. Navy until 1946, returned to the Du 








176 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Pont Baltimore plant until 1949, and acted as instrument 
(i.e., automatic control) consulting engineer for Du Pont’s 
central engineering department until 1950. He supervised 
process and equipment development work at Du Pont’s ti- 
tanium metal plant at Newport (Del.) until 1954; acted 
as senior process development engineer at Du Pont’s Seaford 
(Del.) nylon plant until 1956; and has since been super- 
vising research, development, and consulting work in the 
field of information storage and retrieval as developed and 
practiced in the Du Pont engineering department and in 
several other departments of Du Pont. 

During these 4 years, the work of our group has been 
sufficiently successful so that the various departments of the 
company have found it advantageous to assign approximately 
50 pops full time (and about 200 others part time) to oper- 
ational activities growing out of our research; the degree 
and extent of this application are both increasing steadily. 
We feel that this success, such as it is, has been the result 
of heavy attention to fundamental considerations, to care- 
ful definition of problems, and to as much openmindedness 
as we could muster in applying alternative solutions to the 
defined problems. 


Fundamental considerations 


Our research, development, and application work has led 
us to recognize a number of facts which may be of interest 
to the Committee on Government Operations. Perhaps fore- 
most among the facts is that there 1s no panacea—no system 
universally applicable in all its detail to the storage and 
retrieval problem in all environments. Second, there are 
no absolutely unique storage and retrieval systems, despite 
all the sound and fury which abound and all the claims and 
counterclaims which are being made. All systems are, must 
be, related in principle, because they all have the same basic 
problems to solve. This applies to systems employing hier- 
archical classifications, subject headings, “semantic factor- 
ing” (the Western Reserve development), “Uniterms”, etc. 
Many developers and operators of systems believe their sys- 
tems to be unique, and loudly proclaim this; unfortunately, 
the systems for which the claim of uniqueness is made tend 
to be relatively ineffective and uneconomical, because the 
designers and operators have had closed minds and have been 
unable to utilize the good ideas of others. 

Thus all systems, if they are to be at all effective, must 
operate on the same basic principles. It is unfortunate that 
many practitioners in the art have not been able to distin- 
guish between principles and techniques, between strategy 
and tactics. Different environments do require that different 
tactics be employed, and how effective and economical a 
particular system is depends upon how well the universal 
basic principles are applied and upon how well the techniques 
(tactics) employed match the requirements of the environ- 
ment. 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Third, no matter how well (or poorly) the basic principles 
are applied and no matter how well (or aes: the tech- 
niques match the specific environment, it is possible to “make 
a system work.” However, the economics of a system are 
affected markedly by how well these matters of principle and 
technique are handled, especially by how well principles are 
considered. But, given enough time and effort (i.e., money), 
all systems can be made to work; of course, well-designed 
systems can be made to work better (at the same cost) than 
can poorly designed systems. Further, to obtain a given in- 
crease in effectiveness, the incremental cost will be less for 
a well-designed system than for a poorly designed one. 

Therefore, what is really needed are ways for choosing 
how best to apply basic principles and how to match avail- 
able techniques to the environments at hand. In our opin- 
ion, these matters have been given too little attention by many 
system designers. Strangely, this lack of attention has 
seemed to be at least as marked in academic circles as in 
commercial or governmental circles. 

There are two truly basic considerations involved in deter- 
mining what sort of information it is that must be handled. 
These considerations are described in detail in the writer’s 
attached paper, which was presented to the fall conference 
of the Building Research Institute in November 1959. 
Briefly, these basic matters are— 

1. Time delay (or “ease of feedback”) in commu- 
nication: This permits a very essential differentiation 
between retrieval and processing. 

2. Degree of abstractness of the information involved: 
This permits an equally essential differentation between 
data and intellectual concepts (1.e., “information”). 

Lack of recognition of these two factors has perpetuated 
in many minds a confusion between such things as data proc- 
essing and information retrieval, or between data retrieval 
and information retrieval, or between data processing and 
data retrieval. While all these operations are related and 
may overlap in minor considerations, they are sufficiently dif- 
ferent to require use of quite different techniques. 

For example, general-purpose computers are excellent data 
processors, but until now they have been, at best, extremely 
uneconomical as information retrievers. The reasons for this 
have been apparent for some time, but only recently have two 
other organizations and ourselves seen a way to overcome this 
difficulty ; no one has yet had time to prove the practicability 
of this new idea, but it is so simple—as are all truly creative 
advances—that success seems assured. 

In short, the choice of an information retrieval machine 
is a tactical matter, not a strategic matter. The machine 
should be made to fit the system, not vice versa. A well-de- 
signed system can be mechanized to any degree which the en- 
vironment may require, and as requirements change, the 
machine employed can be changed without modifying the 
basic system design. This will not be true for a system de- 


177 





178 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


signed for one specific machine. Thus the glamour of mech- 
anization must not be allowed to induce one to compromise 
principles. No matter what system is mechanized, the ma- 
chine can do no more than can a human being—given enough 
time, money and effort. 

Our attached paper attempts to define the basic problems 
of information retrieval (pp. 8-12). These definitions have 
now been widely (but not yet universally) accepted by other 
practitioners in the art. Once the problems are defined, 
basic principles of solution are not difficult to detect, as the 
attached paper points out in regard to the principles of vo- 
cabulary prescription and redundancy, which are the two 
basic routes to solution of three of the four basic problems. 
The paper shows that the use of redundancy is preferable to 
the use of vocabulary prescription for information retrieval 
systems, and vice versa for data retrieval systems. 

Redundancy is shown to be, at present, best and most 
cheaply obtained by using the “thesaurus approach” to in- 
dexing, especially if it is desired to control system effective- 
ness at a consistent and high level. Redundancy, however, 
necessitates the use of conceptually short terms (i.e., descrip- 
tors or index entries) and this intensifies the fourth and last 
of the basic problems—that of syntactics. Adequate solu- 
tions to the syntactical problem are described on pages 22-26 
of the attached paper. It is shown that this problem can be 
solved equally well by these techniques irrespective of the 
actual arrangement of the index. It is demonstratably true, 
for example, that relationships among indexed concepts can 
be made as specific in a “Uniterm-type” system as in a “se- 
mantic factored” (Western Reserve) system, and also just 
as easily, if not more easily. 

The matter of security is a very important tactical consid- 
eration. Our experience has been that no security rules or 
regulations need be compromised by an information storage 
and retrieval system. In fact, a properly arranged index 
(specifically, the “prefiled” type described on pages 27 and 
28 of the attached paper) is inherently difficult to compro- 
mise security-wise because the stored information is frag- 
mented so completely in the index. Security can be made 
even more effective when such an index is mechanized be- 
cause the indexing terms will be symbolized by nonsignifi- 
cant codes rather than by “clear” English; also, security can 
be controlled by controlling access to the machine. From 
the retrieval phase of index search onward through the other 
phases of retrieval, conventional security precautions con- 
cerning titles, abstracts, and the documents themselves can 
be applied as usual. 

Beyond this, considerations in system design become so 
superficial and tactical in nature that the advantages and 
disadvantages of the numerous alternatives (with respect to 
any given environment) can be detected and evaluated even 
by those relatively inexperienced in the art. Thus, on the 
surface, different systems may seem to be quite different in 
nature, yet each may fit its purpose quite well. Nevertheless, 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


if they are well-designed, they will still operate on the same 
basic principles (just as all motorized vehicles for travelling 
on land operate on wheels, despite their otherwise dissimilar 
appearances). 
Suggestions 

With the aforementioned background information in mind, 
I shall now set forth some suggestions as to what the Federal 
Government should and should not do. You will understand 
that these are my own personal suggestions based upon my 
own knowledge and experience and upon that of my group. 
It is not to be construed in any way to be the view or the 
policy of E. I. du Pont de Nemours & Co. 

The Government should support— 

1. Research on indexing methods: Such as how better 
(or best) to obtain redundancy and how better to solve 
the syntactical problem (how best to develop a mutually 
exclusive, collectively exhaustive, small and simple set 
of role indicators). Further, perhaps there is a means 
better than redundancy for solving the viewpoint, gen- 
eric and semantic problems. 

2. Actual operational systems within Government: 
With careful monitoring for comparing effectiveness, 
costs and benefits, being sure to distinguish between the 
effects of application of principles and the degree of suc- 
cessful matching of techniques to the environments. 
This evaluation is not an easy nor short task. 

3. Education within Government, industry, and edu- 
cational institutions, particularly on the differences be- 
tween data and information and between retrieval and 
processing. This would clear up much of the squabbling 
now going on and would prevent much of the ineffective 
effort now being placed on applying data processing 
techniques and machines to the information retrieval 
problem. 

The Government should not support— 

1. Machine development. Let manufacturers do 
this; they will as soon as they can be told what is needed, 
as soon as some of the confusion and bickering can be 
cleared up. No major breakthroughs are needed in the 
field of machine development. 

2. “Exotic” or complex coding schemes, such as “syn- 
thetic” languages or the like. All these are gilding of 
the lily and are conscious or unconscious attempts to ap- 
pear erudite and unique. “Clear” English or simple, non- 


significant codes are sufficient to permit effective and 


economical operation of any system, be it manual or 
mechanized. 

3. A massive national information center. At the pres- 
ent state of the art, this would be an operation too, costly 
and too slow and uncertain in its implementation. 
Rather, Government should encourage the development 
of compatible centers for the various technologies— 
centers which could communicate easily with each other: 


179 





180 DOCUMENTATION OF SCIENTIFIC INFORMATION 


Finally, those in the Government should beware of the 
“messiahs,” one of whom comes along every few months. 
These “saviors” are characterized by their grandiose claims 
to being the “best,” or to having a universally applicable 
system, or to having a machine which will solve all the 
problems. They are also characterized by their tendency 
to be effective propagandists for their system and by having 
a complicated system. It has been my experience that such 
people can spring from almost anywhere—industry, educa- 
tional institutions, or even government itself. 


APPENDIX 
A PRACTICAL SYSTEM FOR DOCUMENTING BUILDING RESEARCH ! 


Dr. Taube and his associates have described in their paper 
(1) a number of information retrieval systems both proposed 
and operational. They also noted that there has been a great 
amount of discussion—both verbal and written—concerning 
the merits and demerits of all these and of other systems. 
All this discussion has to date generated much heat—but 
little light. In fact, onlookers at many conventions of us 
documentalists might well liken those meetings to conventions 
of witch doctors—whereat each witch doctor can prove that, 
in his own village, his own way of curing illnesses is the only 
valid way. If a modern doctor listened to the arguments of 
the witch doctors, he would probably be able to detect in 
each proponent’s techniques some element of medical truth— 
albeit well mixed in with superstition. He would probably 
also note an underlying sameness in valid techniques among 
all the conference participants and would note that the dif- 
ferences among techniques generally were insignificant 
variables of superstition. 

Too many of us documentalists today are quite like the 
witch doctors in the example. We can’t agree because we 
have no fundamental background in theory. In fact, some 
of us show no interest at all in fundamentals; we are more 
interested in the perhaps temporary success of our techniques 
in our own little area of interest—which we may assume to 
be typical. We lack understanding of environmental varia- 
bles, just as the witch doctors lacked understanding of bio- 
logical and psychological variables. A few of us assert that 
we have developed “universally applicable” techniques which 
everyone should standardize upon. 

In this paper, however, the author wishes to at least at- 
tempt to don the headdress of an analytical, research-minded 
witch doctor-documentalist, and as such to pose a question. 
In this state of affairs, how can the Building Research In- 
stitute—relatively a novice in the documentation field—ever 
hope to winnow fact from superstition, and true general ap- 

1Bugene Wall. engineering service division, engineering department, BH. I. du Pont de 


Nemours & Co., Inc., Wilmington, Del.; published in abbreviated form fn Library Journal, 
vol. 85, No. 5, Mar. 1960. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 





plicability from partisan pride? For there are some truths 
at hand in this field; there are some generally applicable 
fundamentals. But the word is “fundamentals,” not “tech- 
niques.” To draw an analogy to military operations, it might 
be said that there is a fundamental strategy but that tactics 
should vary with the environment. Thus, in all well-designed 
information retrieval systems, there are fundamental ele- 
ments of “sameness,” despite all the sound and fury which 
have enveloped them. 

Therefore, it would seem that the Building Research In- 
stitute would be well advised to gain an understanding of 
strategic fundamentals, and with this understanding it will 
be found possible to assess correctly the validity—and 
value—of alternate tactical techniques. In other words, it 
is suggested that the fundamental problems of information 
storage and retrieval be defined, so that the capabilities of 
proposed problem solutions may be compared in light of the 
basic characteristics of the problems. In this business, as in 
most others, it seems that correctly defining the problem may 
well turn out to be more than half the task. 

Accordingly, let us proceed first to problem definition. 
It is believed that the definitions to be set forth are not 
parochial nor provincial; they are not particularly original 
with the author nor with those with whom he has been 
working. Rather, bits and pieces have been gleaned from 
nearly all the workers in this field and have (we think) 
been fitted together like a jigsaw puzzle into a meaningful, 
generalized entirety. After problem definition, there will be 
considered broad alternatives for solutions to the problems 
and, finally, some possible tactics which may be used to imple- 
ment these generalized alternatives. 

It is apparent that the problem with which we are faced 
is a problem in communications. Specifically, it is a prob- 
lem in improving communications among three sorts of in- 
dividuals or groups: 

1. The originator of information—he who develops 
the information and he who writes it down. 

2. The (let us call him) indexer—he who decides how 
the information is to be stored away so that it can later 
be retrieved. 

3. The searcher for information—he who has a prob- 
lem on which he needs help. 

If it be agreed that we are faced with a communication 
problem, it must then be decided: “What sort of a communi- 
cation problem?” Communication problems may be of 
many kinds—e.g., acoustical, psychological, sociological, lin- 
guistic, mechanical, etc. Obviously, we cannot hope to solve 
all the problems of communication—nor need we. It does 
appear, however, that at least two basic facets of nearly all 
communicative problems are significant to the Building Re- 
search Institute during its consideration of information 
storage and retrieval. 





181 









































































































































































182 DOCUMENTATION OF SCIENTIFIC INFORMATION 


The first of these considerations is what may be called 
feedback of information. Actually, the existence of “feed- 
back” is a matter of degree; this is one dimension of a con- 
tinuum (Fig.1). For example, “a conversation forms a two- 
way communication link; there is a measure of symmetry 
between the parties, and messages pass to and fro. There is 
a continual stimulus-response ; remarks call up other remarks, 
and the behavior of the two individuals becomes concerted, 
cooperative, and directed toward some (communicative) 
goal” (2). 

FIGURE 1 


The Communicative Continuum 


MUSIC 
ART 


HUMOR 
POETRY 
SATIRE 


ECONOMICS 
RESEARCH RESULTS 


ROUTINE REPORTS 
STD. TEST DATA 


LOG TABLES ‘ 
MULTIPLICATION TABLES "Feedback" Dimension—> 


"Abstractness" Dimension —> 


Slightly further along the continuum, the reading of a 
newspaper represents a unilateral, noncooperative communi- 
cation (except when the reader writes letters to the editor or 
cancels his subscription). Still farther along the continuum, 
the author of a technical paper seldom has immediate oppor- 
tunity to obtain “feedback” from his peers in the technical 
field in which he is interested. Of course, near the end of the 
continuum, there is essentially no way for the recipient to 
form a “cooperative link” with the originator of informa- 
tion. For example, an archeologist deciphering a stone in- 
scription receives no help (except for nearby artifacts) from 
his forebears other than the signs carved upon the stone. 

In other words, the presence or absence of “feedback” in 
the communication process, and the “process time-constant” 
(or “lag coefficient”) of any “feedback,” determine to a large 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


extent how easy it is to achieve effective communication, 
which is the transmission of meaningful knowledge from one 
source to one or more receivers. Accordingly, this “feed- 
back” dimension of the communicative continuum must, of 
necessity, be considered. 

The other principal dimension of the communicative con- 
tinuum is perhaps more familiar. It is concerned with the 
degree of abstractness of the information being communi- 
cated. What is meant by abstractness? Here is not. meant 
“how far the information in question is abstracted from basic 
considerations,” in the sense that a table of logarithms is 
highly abstracted from a basic theory of numbers. Rather, 
when speaking of the “degree of abstractness” dimension of 
the communicative continuum, in this context it is best 
thought of as the degree of abstract thought required in em- 
ploying the information involved. Hence, a table of log- 
arithms would be near the “low” end of the abstractness di- 
mension. By the same token, music has a very “high” degree 
of communicative abstractness, followed probably by art, 
humor, and poetry in approximately that order. 

Now we have a two-dimensional communicative continuum 
under consideration; one dimension is “degree of abstract- 
ness” and the other deals with “feedback.” On the “degree 
of abstractness” scale, there is little practical interest, insofar 
as storage and retrieval are concerned, in the extreme ends 
(in either such things as music or log tables). Near the 
middle of the spectrum, however, there can be distinguished 
technical ideas (or information) and, near the “low” end, 
data—and we might well be interested in the part of the spec- 
trum bounded by these limits. 

Along the “feedback” dimension, there can be distin- 
guished such activities as conversation, message transmission, 
information (or data) processing, the reporting of results of 
calculations, retrieval, etc. In this paper, consideration will 
be limited to the retrieval “zone” and to those matters perti- 
nent to retrieval. 

Further, because we are interested in research, our consid- 
erations will be limited principally to the information or 
“idea” portion of the “abstractness” spectrum, because this is 
the zone in which most research information falls. This is 
not to say that research does not develop data; rather, it is a 
recognition that the data developed during research usually 
require interpretation by words—words which stand for ideas 
not easily quantifiable—and this returns us to consideration 
of the area of communications noted in the figure. You will 
note that this area does not include such things as data proc- 
essing, which is quite a different matter. This distinction be- 
tween data and information was also made in Dr. Taube’s 
paper (1). 

Accordingly, let us proceed to discuss problems arising in 
information tee idea) retrieval. Of course, in dealing with 
any situation, there are three sorts of problems which arise. 
This is true in the realm of communications as it is in all 


183 





184 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


other problem areas. First, there are the technical or in- 
tellectual problems which must be solved—and if these can- 
not be solved adequately, then there is no real point in 
worrying about any other sort of problem. However, pre- 
suming that, solution of the tolled problems of documenta- 


tion is possible, we must be concerned with economic prob- 
lems. Technical solutions must be economically attractive— 
or, at least, they must not carry with them an economic 
ae: Finally, there are relationship or political prob- 


In line with the thesis that the technical problems must be 
solved in any event (and probably must be solved before the 
economic or political problems can be attacked effectively), 
let us first examine the specific technical problems in this 
area of communication—in information retrieval. There ap- 
pear to be but four technical or intellectual problems signifi- 
cant to this area. These are the problems of viewpoint, 
generics, semantics, and syntactics. The first two of these 
problems are characteristic of human thought, as will be 
illustrated shortly, and the last two are characteristic of the 
particular language involved—in our case, English. 

First consider the problem of viewpoint. Every individual 
is a unique composite of the combined, cumulative effects of 
his education, experience, background, environmental con- 
ditioning, and relationships with other individuals. Accord- 
ingly, individuals contemplate objects, ideas, facts or images 
with different. viewpoints. “How you look at it” depends « on 
how you got where you are when you look at it. It is not 
difficult: to see that the word “oil” may be variously inter- 
preted to mean: petroleum, lubricant, road surfacing mate- 
rial, cooking material, vehicle for medicines, fuel, source of 
other fuels, perfume base, hair dressing, paint vehicle, polish, 
etc. Many words are similar to “oil” in that it is perfectly 
reasonable to assign them to more than one logical class. 
During indexing, it is necessary to insure that variations in 
view point among originators, indexers and users of informa- 
tion will not result in “missing” vital information during 
retrieval. 

The second, or generic, problem is concerned with family 
trees of concepts. Because each concept implies broader 
concepts, a literature search for information referring to 
broad concepts of knowledge should effectively retrieve in- 
formation referring to narrower but related or subordinate 
concepts. (When one saws off a big branch of a tree, one 
normally expects all the little branches which are attached to 
the one big branch to be removed as well—and all in one saw- 
ing operation). For example, retrieval of all information 
pertaining to the chemical family “halides” should also, 
and automatically, result in obtaining all information on the 
members of that family; namely, bromides, chlorides, flu- 
orides, and iodides, 

It is normal for a concept to belong to more than one gen- 
eric tree; “dichlorodifluoromethane” is a narrower concept 





DOCUMENTATION OF SCIENTIFIC INFORMATION 185 


within several broad concepts, such as “chlorinated hydro- 
carbons” and “fluorinated hydrocarbons.” In turn, “chlori- 
nated hydrocarbons” and “fluorinated hydrocarbons” are 
both properly “halogenated hydrocarbons” and “dichloro- 
difluoromethane” is a “Freon.” * 

When combinations of concepts must be considered, the 
family tree relationships are complicated considerably, 
resulting in intertwined, entangled branches which are by 
their very nature extremely difficult to separate from each 
other. 

The third, or semantic, problem involves the relationships 
between concepts themselves and the symbols for concepts 
(that is, the words or terms used). Simply, the semantic 
problem is concerned with the relationships between words 
and their meanings. In this area, we become concerned with 
synonyms, near-synonyms, and homographs. Homographs 
must be distinguished from each other because they are 
spelled the same, but sometimes have different pronouncia- 
tions and always have different meanings—e.g., “flashing” 
(weather protection) and “flashing” (intermittent light). 
Other examples of semantically confusing words include 
“base,” “color,” “lead,” “finish,” “tank,” and “cracking.” 

Another significant semantic problem is that there are 
situations in which two or more words have identical or very 
similar meanings (depending upon—viewpoint). For exam- 
ple, within the Du Pont Co., the operation of “moving 
liquids through pipes” is generally referred to as “transfer- 
ring.” In some cases, however, it is referred to as “trans- 
porting.” If pairs of words like “transferring” and “trans- 
porting” are permitted to remain in the vocabulary of any 
storage and retrieval system without provision for advising 
searchers that the information desired may be found under 
more than one term, then the searcher will retrieve only that 
pertinent information which is included under the term he 
happens to use in his search; he will not retrieve that infor- 
mation which is listed under the synonymous or near-synony- 
mous term. Any retrieval system must detect the situations in 
which more than one word or phrase may be used to describe 
a specific concept and make provisions for cross-reference so 
that a searcher will be able to retrieve essentially all perti- 
nent information on the concepts in which he is interested. 

The last problem is one of syntactics. Syntax relates to 
the ordering or arrangements of words and the changes in 
meaning of a group of words which may result from modi- 
fying the relative order of words within the group. Con- 
sider “one-eyed, one-horned flying purple people eater.” 
This problem is particularly important in information sys- 
tems which employ conceptually short terms—i.e., wherein 
retrieval is accomplished by using terms which usually stand 
for single ideas or concepts. For example, coordinating the 
terms “fabrication” and “clamps” retrieves items which 
refer both to fabrication using clamps and to fabrication of 


2 Trademark for Du Pont’s fluordinated hydrocarbons. 








186 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


clamps. Similarly, “steam” and “ heating” retrieve infor- 
mation on the heating of steam and on heating using steam. 
Of course, there are also several other specific types of syn- 
tactical problems, which will be described in more detail 
later. 

These, then, are the technical or intellectual problems 
which must be solved adequately in order that storage and 
retrieval of information may be effective and economical. 
Note that inadequate solution of the first three problems 
(viewpoint, generics, and semantics) results generally in the 
loss of information during retrieval, whereas inadequate solu- 
tion of the syntactical problem results in obtaining nonperti- 
nent information during retrieval. 

These definitions of the four technical problems of infor- 
mation storage and retrieval have proved valid during 3 years 
of quite broad experience within the Du Pont Co. They 
have also been agreed to by many other practitioners in the 
field. Unfortunately, there have not yet been found such 
useful definitions for the economic and political problems. 
Hence, solutions to the technical problems have had to be 
flexible enough to handle undefined economic and political 
environments. It is suspected that most other organizations, 
including the Building Research Institute, may face this 
same situation. 

Under these circumstances, it seems that there are only two 
basic approaches for solving the four technical problems. 
The first of these is, in effect, the prescription of a vocabulary 
for storage and retrieval. The second is the use of redund- 
ancy in storage and retrieval. Note, however, that these two 
basic approaches themselves constitute the extreme ends of 
another continuum. In practice, no system employs a pre- 
cise, nonredundant vocabulary nor does any employ a com- 
pletely “nonprescribed” vocabulary. Prescribed or not, the 
vocabulary consists of the complete set of terms used to de- 
scribe the subject matter of the stored documents, as Dr. 
Taube pointed out in his paper (1). 

Examples of prescribed vocabularies are formal hierarchi- 
cal classifications, such as the “Dewey Decimal System,” the 
Library of Congress classification, the “Universal Decimal 
Classification,” and many small and local classifications. The 
authority lists used by many librarians are also examples of 
prescribed vocabularies. 

Please note that here the term “classification” is not used 
as a synonym for “shorthand description” or the like; rather, 
it is used in the restricted sense of formal hierarchical ar- 
rangement, wherein ideas are included as_ subclasses of 
broader ideas, etc., and are formally arranged, for retrieval 
purposes, in such a manner. 

In light of the previous discussion on the “abstractness” 
dimension of the communicative continuum, it seems apparent 
that a prescribed vocabulary (such as a formal hierachical 
classification) operates most effectively at the “low abstract- 
ness” end of the continuum and becomes less and less effective 
as the highly abstract portion of the continuum is ap- 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


proached. Here, meaning does not reside within the word 
or symbol alone but rather (if you will) depends principal- 
ly upon viewpoint. Accordingly, it would be expected that 
prescribed vocabularies would be used for data retrieval sys- 
tems, but not for information (or idea) retrieval systems. 
And, because the Building Research Institute is concerned 
with research information, which ordinarily must carry prose 
along with any numbers (i.e., data) in order to make the 
numbers meaningful, the Building Research Institute must 
also be concerned with a somewhat abstract portion of the 
communicative continuum. Thus, prescribed vocabularies 
would not be generally applicable to Building Research in- 
formation systems. 

It is apparent that prescribed vocabularies, such as formal 
hierarchical classifications or authority lists, will be advan- 
tageous if one or more of the three following situations pre- 
vail, as they always do in well-designed data retrieval 
Systems : 

1. The collection of documents is small, so that they do not 
need to be subcategorized too greatly. 

2. The field of technology covered by the stored documents 
is narrow, so that the prescribed vocabulary can be small. 

3. The number of potential users of the stored information 
is small, so that the conventions necessary in using the pre- 
scribed vocabularly may be policed effectively. 

Please note that we are not objecting to formal classifica- 
tions when they are used for organizing one’s thoughts or in 
getting an overall view of a total situation. We are merely 
saying that the utility of a classification is much less in in- 
formation storage and retrieval than it is under those other 
circumstances. 

Can the other alternative—redundancy—be employed in 
storing and retrieving information or ideas? Certainly, to 
many people, redundancy is a nasty word. To them, it im- 

lies something unnecessary, repetitive, verbose, or the like. 
To the communications engineer, however, redundancy is 
something quite different. While it may still be repetitive in 
a certain sense, redundancy is essential in order to insure that 
a signal is not lost in the noise which may exist in the com- 
munication channel. 

Even in everyday conversation, we are unconsciously using 
a great deal ol tolerant: It has been said that the En- 
glish language is more than 50 percent redundant and that if 
someone could speak with no redundancy we would be un- 
able to understand what he would say. So, redundancy does 
have its virtues; what we have come to think of as being un- 
desirable is excessive redundancy. And, “excessive” depends 
upon the cireumstances—depends upon the uses to which we 
wish to put this principle of redundancy. In some instances, 
it would be advantageous to employ redundancy, whereas in 
other instances, it would be avoided insofar as possible. 

There are two ways in which redundancy may be profitably 
employed in the storage and retrieval of information. The 


187 





188 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


first is that one may index redundantly; i.e., one may index 
under all probable viewpoints, at all probable levels of gen- 
erality of viewpoint, and with all probable terminology which 
may be employed by the originators, indexers, and users of a 
system. Alternatively, however, one may index by taking 
into account only immediately apparent viewpoints, generics 
and semantics, but we may search redundantly; ie., by 
translating each individual inquiry into a number of different 
inquiries, using search terms standing for probable view- 
points, probable levels of generality of viewpoint, and prob- 
able terminology. This means that each individual inquiry 
will result in a number of questions—each question being 
composed of a permutation of different terms, which might 
have been used to index the desired information. 

The choice of whether redundancy is employed at the input 
or at the output end of a system deveits purely upon eco- 
nomics. It depends especially upon the search cost per ques- 
tion, which depends in turn upon the number of searches 
requested per unit time. Against this must be balanced the 
accession rate of new documents into the system and the unit 
input cost for redundant indexing. For systems which hold 
a large collection, especially systems which have a high ratio 
of references to accessions, the designers should consider care- 
fully the desirability of employing redundancy at the input 
side. 

In either situation, the use of redundancy results in a 
“continuous” solution to the problem of cost versus effective- 
ness. That is to say, if cost is plotted against effectiveness 
or retrieval, a smooth curve will result (fig. 2). Costs will 
be zero at zero effectiveness and will increase with a steadily 
increasing slope, approaching infinity as 100 percent effec- 
tiveness is approached. There will be no discontinuities 
on the curve—as often occurs when attempting to employ a 
prescribed vocabulary (e.g., when a classification must be 
entirely redone in order to keep up with changes in tech- 
nology). 

It is believed that the relationship between benefits and 
effectiveness is “S-shaped”—i.e., benefits rise slowly as ef- 
fectiveness is first increased from zero and then more and 
more rapidly, yet the curve finally flattens out as 100 percent 
effectiveness is approached. In practice, one would wish to 
operate at the effectiveness level which provides the greatest 
difference between benefits and costs. 

At present, unfortunately, the choice of this optimum level 
of effectiveness must be largely a subjective one, because bene- 
fits cannot yet be quantified well. The desirability of the 
“continuous” solution to the technical problems, however, 
must not be minimized, because it provides a true capability 
of adjusting storage and retrieval operations to any economic 
facts of life which may develop at a later date. This is cer- 
tainly one significant advantage, among many, of using re- 
dundancy rather than vocabulary prescription to solve the 
viewpoint, generic and semantic problems. ; 





DOCUMENTATION OF SCIENTIFIC INFORMATION 189 
FIGURE 2 


Economics of Retrieval 


DOLLARS Maximum 
Difference 


Optimum 
Effectiveness 





0% EFFECTIVENESS 100% 


If it is presumed, based on this reasoning, that redundancy 
rather than vocabulary prescription is probably the most 
advantageous principle to employ for storage and retrieval 
of information, there is then created another problem. How 
can this redundancy be obtained? If one elects to employ re- 
dundant indexing, where will one find this paragon of an 
indexer who can use all probable viewpoints, generics and 
terminology? If one elects to employ redundant searching, 
where will one find a person who can compose inquiries 
employing all the permutations of probable viewpoints, 
probable generics, and probable terminology ? 

To a minor degree this problem has already been solved, 
for there is in existence a device known as Roget’s “Thesau- 
rus.” Unfortunately, this “Thesaurus” includes many terms 
in which we are not interested and excludes many technical 
terms in which we are vitally interested. As it now stands, 
its usefulness is limited. However, the principle is still valid. 
An appropriate technical thesaurus would serve as a “word 
guide list,” a “word reminder list,” for the indexers and re- 
trievers of information. Such a thesaurus would indicate 
synonymous terms, generic relationships among terms, and 
other relationships among terms. 

The possibility of using a thesaurus for solving the view- 
point, generic and semantic problems is not original with us 
at Du Pont. Its advantages have already been described by 
several others, including Bernier (3) of Chemical Abstracts, 


54122—60———_13 





190 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Heumann (3) of National Research Council, Luhn (4) of 
IBM, and Taube (5) of Documentation, Inc, 

The creation of such a technical thesaurus is not merely a 
theoretical possibility. We in the Du Pont Engineering De- 
partment have constructed a thesaurus for use in information 
storage and retrieval (fig. 3). We have found it essential for 
achieving system effectiveness. 

When first considering the creation of a thesaurus, one is 
inclined to wonder whether all the hundreds of thousands 
of words in the English language must be included, and if so, 
one is appalled by “the magnitude of the task. But, in fact, 
the vocabulary of science is quite limited. Numerous i inves- 
tigators have pointed out that the vocabulary of any one 
field of technology is limited to approximately 5,000 terms, 
that the vocabulary of all technologies is limited to approxi- 
mately 20,000 terms, and that the whole of human knowledge 
could be expressed in less than 40,000 terms. These vocab- 
ularies are, of course, descriptive vocabularies; they do not 
include names of people, places or things—nor even such 
things as names of chemical compounds, for which there ap- 
pears to be no growth limit. This category of terms known 
as names, however, is one which causes only minor difficulties, 
in practice, in the operation of information systems. 

Presuming existence of a thesaurus, how might it be used ? 
If redundant indexing is to be employed, the indexer might 
first. list as indexing terms those words or phrases which are 
used by an author to describe the information he is attempt- 
ing to communicate. The indexer could add words or phrases 
of his own further to describe the information in the docu- 
ment at hand. He could then refer to the thesaurus to obtain 
generic and other terms related to those terms already listed 
as index entries. Depending upon the value of the informa- 
tion contained in the document, the indexer could use the 
thesaurus to whatever extent might be appropriate. This 
determination would be a subjective one. The indexer must 
obviously be competent to a considerable degree in the field 
of knowledge which he is indexing. 

In this manner, the indexer could describe information 
from all probable viewpoints, all probable levels of general- 
ity of viewpoint, and with all probable terminology. Of 
course, the thesaurus would have to be a living, growing docu- 
ment—one which would be subject to continual modification 
and updating. Such modifications, however, could be made 
easily on a piecemeal basis; this contrasts with the wholesale 
modification which must be made periodically to formal clas- 
sifications and the like. 








DOCUMENTATION OF SCIENTIFIC INFORMATION 191 


Figure 3 


EXAMPLE OF A TECHNICAL THESAURUS 
383100 S. AND R. DIVISION-SEE SALVAGE & RECLAMATION DIVISION 
432700 . S.B.B. ACID-SEE SULFOBENZOYLBENZOIC ACID 
417700 S.P.-SEE SPLAY POINT 


381700 SABINE RIVER 


PO 378300 RIVERS 
PO 482100 WATERWAYS 


381900 SADDLES/PACKINGS/ 
GT 44900 BERL SADDLES 
RT 80300 CERAMICS 
RT 316400 PACKINGS 
RT 363800 RASCHIG RINGS 
RT 392200 SEPARATION 


282000 SADDLES/SUPPORTS/ 
RT 111800 CRADLES 
RT 201200 FOUNDATIONS 
RT 316700 PADS 
RT 436000 SUPPORTS 


382100 SAFETY 
RT 1600 ACCIDENTS 

RT 61000 BURNS/INJURY/ 
RT 101300 CONDITIONING 
RT 191000 FLAMEPROOFING 
RT 209700 GLASS 
RT 219600 HAZARDS 

RT 220100 HEALTH 

RT 239400 INJURIES 

RT 331400 PHYSIOLOGICAL 
RT 350900 PREVENTION 
RT 355900 PROTECTION 

T 356400 PSYCHOLOGICAL 
RT 383600 SANITATION 


382300 SAGGING 
42300 BENDING 
RT 112800 CREEP 
RT 124100 DEFLECTING 
RT 129400 DEPRESSING 
RT 393000 SETTLING 


19200 SAL AMMONIAC-SEE AMMONIUM’ CHLORIDE 


382400 SALARIES 
PO 320700 PAYMENTS 
RT 98700 COMPENSATION 
RT 320800 PAYROLLS 
RT 480200 WAGES 


The “thesaurus approach” carries with it one possible 
penalty. It forces one to employ conceptually “short” in- 
dexing terms. Here, “short terms” means “unit concept 
terms,” as distinguished from “terms standing for combina- 
tions of concepts.” The use of conceptually “short” terms is 
necessary because we have found it impossible to construct a 








192 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


thesaurus in which the terms stand for combinations of con- 
cepts. The same inherent, logical factors have forced every 
lexicographer, from Samuel Johnson on, to use terms which 
stand essentially for unit concepts in all dictionaries and the- 
sauri ever created. The reason is, of course, that terms 
standing for combinations of concepts (such as phrases, 
Dewey Decimal numbers, or the like) are so specific in mean- 
ing that they can hardly be defined in terms other than 
themselves. For example, what single term could stand for 
a single concept which would be generic (in the inclusive 
sense) to the combination of concepts expressed by the phrase 
“evaluation of foamed plastic insulation under variable 
climatic conditions.” 

_ One is thus forced into the use of conceptually “short” 
indexing terms—often single words and almost. invariably 
terms which stand for unit concepts. This is not necessarily 
bad; there are actually some distinct advantages. The prin- 
cipal one is that an enormous body of knowledge can be de- 
described with relatively few terms—just as we can com- 
pose an almost unlimited number of English words using 
only 26 letters. In order to do this, however, our terms must 
be combined at the output side of the system rather than at 
the input side. In essence, such a technique consists of find- 
ing the information which exists at the logical intersection 
of terms and, accordingly, systems which employ such 
techniques are known as coordinate or concept coordination 
systems. However, information signified by the logical 
union of terms can also be retrieved in such systems. 

Another advantage of using unit-concept terms is the sys- 
tem simplicity which results; the complexity of information 
storage and retrieval systems depends largely upon term 
length. “Short term” systems, besides being simpler, also 
tend to be less bulky and easier to use. 

The use of unit-concept terms, however, does carry with 
it one rather serious disadvantage. The syntactical problem 
is intensified. Coordination of the terms “cooling” and 
“water” will result in retrieval not only of the documents 
dealing with “cooling water” but also of those dealing with 
“water cooling”—an entirely different idea. In other words, 
the use of unit-concept terms intensifies the “noise” problem 
in the communication channel; the tendency will be to re- 
trieve more nonpertinent information than if the terms were 
“longer.” Note, however, that the system at least “fails 
safe”; the probably pertinent information is retrieved, but 
unfortunately an amount of nonpertinent information is also 
retrieved. This may, in large systems, be quite undesirable. 

What, then, is the solution to the last of the four technical 
problems—to the syntactical problem? Again, an empirical 
approach is suggested—an approach which has as its goal 
adequacy rather than perfection. Let us construct a logical 
model of an index such as has been described. This model 
will be a binary rectangular matrix (figure 4). Each hori- 
zontal row of the matrix will stand for an item of informa- 








ek lua a Cl 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


tion stored in the system. Items may be journal articles, 
books, reports, file folders or the like. Each of the vertical 
columns of the matrix will stand for an indexing term, with 
each such term symbolizing a unit concept. Then, to dis- 
tinguish the terms by which any particular item is indexed, 
place a “1” at the appropriate term-item intersections. All 
intersections which are not occupied by a “1” may be pre- 
sumed to be occupied by a “0.” 


FIGURE 4 


“Model” of Index 


TERMS —> AIR COOLING WATER 


ITEMS 
Y 
l 0 l l 
2 0 1 l 
3 l l 1 


In this simplified example, item 1, as indicated by the top- 
most horizontal row of the matrix, discusses “cooling water.” 
Item 2, the middle horizontal row of the matrix, discusses 
“water cooling.” Item 3 discusses “cooling of air with 
water.” 

Thus, in our hypothetical matrix, there are three terms. 
These will be “water,” “cooling” and “air.” We will not be 
concerned here with the viewpoint, generic and semantic 
problems—only with the syntactical problem. Note that if 
all information on “cooling water” is desired, the inquirer 
will be referred to all three items, whereas only items 1 and 
and 3 are directly pertinent. 

This simple problem can be solved by an oversimplified 
use of what we in Du Pont have called “role indicators” 
(figure 5). The two materials terms, “water” and “air,” 
are divided into “use of” and “passively receiving an action” 
portions. Now the “noise” in the simple example has been 
eliminated. It can be seen that item 1 discusses “the use of 
water for cooling,” that item 2 discusses “cooling of water,” 
and that item 3 discusses “the use of water for the cooling of 
air.” 

Of course, two role indicators are insufficient for effective 
storage and retrieval. We in the engineering department 
have found it necessary to subdivide our terms by as many 
as 12 roles; others may find it desirable to develop fewer, 
more or different roles. If role indicators are carefully de- 
signed, as we believe ours are, they may be quite generally 
applicable to numerous broad fields of knowledge. In fact, 
several of our roles carry grammatical connotations such that 


193 








194 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


we are, in effect, inflecting our terms in order to avoid syn- 
tatical ambiguity. Please note, however, that we are not at- 
tempting to provide precise grammar in our indexes. Rather 
we are only attempting to make it possible to provide suffi- 
cient grammar such that the “noise” may be minimized. 


FIGurRE 5 


Modified “Model” of Index 


ITEMS 


} 


] 
2 
3 


TERMS —> AIR COOLING WATER 
ROLES—» USE OF PASSIVE USE OF PASSIVE 
0 0 l l 0 
0 0 ] 0 ] 
0 ] ] ] 0 


A term-with-role-assigned is essentially a precoordination 
of the term with an implied definitive concept term which 
imparts to the term-plus-role an element of syntax or word- 
ordering so that stored information produces fewer false as- 
sociations. This means that there are three basic require- 
ments which must be met by a set of role indicators: 

1. They must be indicative of broad concepts which are 
encountered very frequently in the particular environment of 
the information system. 

2. They must, insofar as possible, be nonambiguous among 
themselves (i.e., mutually exclusive) and—accordingly— 

3. They must be few in number. 

Even the use of role indicators does not completely solve 
the syntactical problem. Role indicators alone will not pre- 
vent “noise” when a document discusses, for example, “the 
corrosion of iron in sulfuric acid and the corrosion of copper 
in nitric acid.” Role indicators would not prevent the re- 
trieval of the document when “information on the corrosion 
of iron in nitric acid” is requested. This problem, however, is 
easily solved by subdividing the horizontal item rows in the 
matrix in a fashion similar to that in which the term columns 
were subdivided. This, in effect, means subdividing the 

hysical items of information into smaller items. This is 
Best done by an intellectual subsectioning, not a physical 
or “geographical” subsectioning. One might say that we 
really end up by preparing individual sentences describing 
the document at hand, with each sentence having its elements 
made more precise by the use of role indicators. 

In such a manner, the original binary rectangular matrix 
is made into a much finer-grained matrix, but it still remains 
a binary rectangular matrix and is logically equivalent to 


| 













DOCUMENTATION OF SCIENTIFIC INFORMATION 





the system described earlier. By subdividing terms and 
items, we have not eliminated all “noise,” but are only per- 
mitted to make a reasonable compromise between “noise” 
elimination and effectiveness of retrieval. 

At this point there may be discussed some practicable 
tactics in the implementation of the more basic considerations 
of storage and retrieval of information. There is now achoice 
as to how the terms and items may be grouped. Consider 
again the binary rectangular matrix, with or*without terms 
and items subdivided because it will make no difference— 
the matrices are logically the same. Because in the actual 
physical world it is impracticable to arrange everything into 
a huge binary rectangular matrix, the information must be 
grouped either according to items (the horizontal matrix 
rows) or according to terms (the vertical matrix columns). 

Let us call systems grouped according to items (the hori- 
zontal rows) randomly grouped systems (figure 6). An 
example of such a system would be one in which a card-fed 
computer of some sort is employed for retrieval purposes; 
each card (figure 7) would be occupied by the index entries 
for one given item; on that card would be grouped together 
the item number (i.e., the “address” of the actual physical 
document) and all the index entries for the item. The fol- 
lowing card would contain the next item number and the 
index entries for that item, ete. Items would be entered on 
cards as they arrive at the information system and are 
indexed. Thus, the most recently arrived item would occupy 
the last card in the system. Hence this would be called a 
randomly grouped system because the subject matter would 
have no effect upon the arrangement of the items in the store. 
When searching such an index, it is necessary to examine each 
of the items, one by one, from the first item to the last. Such 
a searching method forces one to look at a tremendous amount 
of information in which one is not at all interested. You 
may say that the randomly grouped store could be subdi- 
vided into various subject classes—but the problems involved 
in classification have already been discussed, and such action 
does not appear to be indicated for large systems. 


FIGureE 6 


Randomly Grouped Systems 









TERMS > AIR COOLING WATER 
ROLES > USE OF PASSIVE USE OF PASSIVE 








*yuoujed ase (S494)0 sdeysed pue) 
€ pue [ sway} “Hujjo02 jo sasodind 
JO} pasn J9yeM SU1@2U09 UO}}SEND 


“wey Jad ped duo 4Sed| iV 


"uolsenb YyIee JO) 
peyduees aq ysnw xapuy @4)Uz 


Z 
= 
: 
fe 
z 
A 
oO 
_ 
Ey 
= 
Z, 
= 
B 
ic) 
° 
7% 
2 
= 
< 
b, 
é 
oO 
© 
A 


: asks pednoisy Afuopuvy 


L anol 


196 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


Let us call systems grouped according to terms (the ver- 
tical columns) prefiled systems (figure 8). In such systems, 
one might have a card standing for each term in one’s 
vocabulary. On this card, one would enter the item numbers 
(i.e., the document “address”) of the items which have been 
indexed by that term. Now, if one wishes to retrieve in- 
formation, one has only to examine the portions of the store 
which are most likely to contain the information in which one 
is interested (figure 9). That is, of the total set of cards, 
examine only those few cards standing for the concepts per- 
tinent to the question—in this instance, the cards for “‘cool- 
ing” and “water—use of.” Note that there are three per- 
tinent referenees. This contrasts with the searching method 
in randomly grouped systems, which involves examining 
each of the items, one by one, from the first item to the last. 


Figure 8 


Pre-filed Systems 








197 





198 DOCUMENTATION OF SCIENTIFIC INFORMATION 


FIcure 9 


Pre-filed Systems 


Water (use of) 
Search only the portions of - 8s Oe ‘a = 
index most likely pertinent to Bee 
question. 


Ba Cooling. aan 
“QO 741 etc. 


One card per term 


Aardvark 


147 
6, 849 
a n Question concerns water used 
: L for purposes of cooling. items 1, 
3 and 984 are pertinent. 


Both the randomly grouped and the prefiled systems are, 
of course, logically equivalent and absolutely equal in re- 
trieval power and effectiveness. They may, however, have 
significantly different characteristics insofar as economics are 
concerned. Especially for idea or information systems and 
particularly where the collection size is large, the prefiled 
method appears to be best. The choice of grouping method, 
however, is not fundamental, because a system grouped in 
either fashion can be changed to the other fashion at any time 
with no loss in effectiveness; there will, of course, be a one- 
time economic penalty in the changeover operation. 

Figure 9 illustrates one form of prefiled index—a cen- 
tralized but simple card file. Figure 10 is an example of the 
same basic sort of index which illustrates provision for de- 
centralized searching. This index format is designed to per- 
mit publication and wide dissemination. The entire index 
is duplicated, side by side, in one book and the two sides are 
bound independently at the top; this facilitates the easy com- 
parison of item numbers listed under any two terms. In 
addition, the item numbers listed under each term are di- 
vided into 10 columns according to their terminal digits and 
this, too, facilitates searching. 





ast 
vSgt 
@ ayy 


Gog | Ot a | 
ath yout (@Apseefpe) OINOIL¥O 


199 


vo@ * Gua 
a9 aye yet vin 
(g@ eon ~ £) (Ssud0ud) NOILVSNZGNOD 


te tole Ek A 


ang nttz 
(#200j70 = 6) (SEB00Ud) NOTLVYULNZONOD 


Pie ee eae 
azzt ! oott atte 


watt bs - ¥HQ GEST 
WO YOseeeor = Gg) (SS390M) NOILYULYTONOD 


' ' ‘ ‘ ‘ ’ ' ' 
ang (#300330 - 6) SLEATVIVO 


(30 een = €) (SoRIOUd) NOTLvUINADHOD 
| | | | yoer | 


| | | | | | accz | | | vig wel 4ti VOU 
ene SLL asse ange atoz (uo yoaveees ~- 9) SLSATVLVD 
(uodn pegoe ~ TL) (NOILISOdWOO) NOTLVULNIONOD : ; ¢ , ‘ : 
ecst 
utee (aueufwequed - 9) SLSATVLVD 


| 
O6e sone vilt z 2 
vest 92 attt 06 veee 
VEG GEE CLL O9F  Ooye ~ 6) (NOLESOANOD) NOTLVULNADNOD (siWexoees - €) SusATVLVO 


LSet er ees 
tect 


m6e2 agee 2901 QyS eke aect wt ve 
* (uo Rsusees - o (NOILISOdHOD) NOTLVULNZONOD aeit vent 
* azg 06 
seenes - 2) SLSATVIVD 


at ge 

vege 

ure 

VEn2 

viez 

ote 

vite) Tete 
HEOe] vete 
vor] a2cz d022 
vE6t | oete aoz2 
VELt| aete ao5t 
VEST] Sent vott 
VECT) weqt yoet 
4 ae voUt 
goot 
(uodn pezoe - (I) SaLVULNZONOO at | vel voot 
? tke ree vOo 
: . ‘ . ® att | vee a0L 
atlz vet vet aL 
(Jo uaseep - Ot) SaivulNZzON0O (39 eon - «) SLSATYLVO 


XJGNI GJINIYd V 40 JIdWVXa 
OT aanerg 


A 
g 
: 
aa 
S 
fe 
Ai 
g 
St 
a 
Z 
= 
B 
i 
3 
a 
S 
— 
< 
a 
j 
5 
o 
5 
a 





200 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


In the example shown, the terms “catalysts—role 11” and 
“concentration (composition)—role 2” express the question 
“Effects of catalyst concentration.” When these terms are 
coordinated, it is seen that there are 17 pertinent documents, 
as indicated by the item numbers which are encircled under 
each term. These matching numbers can now be matched 
against numbers listed under a third term, and soon. Such 
a simple index, updated periodically, will usually be a satis- 
factory retrieval tool until the document collection becomes 
quite large. 

At this point, it may be asked: “How might some of these 
tactics be implemented in practice?” Here will be detailed 
an example of an indexing and retrieval system based upon 
the nrinciples already described. It must be understood that 
there are many possible—and legitimate—variations upon 
this major theme, and that this example thus falls into the 
category quoted in Dr. Taube’s paper (1) “How I indexed my 
library—but good.” With these reservations concerning the 
lack of general applicability of the tactics, the examples can 
now be described. 

Following selection of a document for inclusion in the col- 
lection, the following steps may be taken : 

1. An accession number—an “address”—is assigned to the 
document; this is usually the number next higher than the 
last previously assigned accession number. 

2. A technically qualified person indexes the document by 
doing the following: 

(a) He analyzes the document to determine its informa- 
tion content. This analysis step is the same for all retrieval 
systems—there is no shortcut for having a qualified person 
gain an understanding of the content of the document. This 
step is also the most expensive step in the input procedure. 


Of course, the cost of document analysis may be controlled 


by various policy decisions. For example, the policy may be 
to skim through the document and not by any means to under- 
stand completely its content; this will make for relatively 
shallow indexing and lower costs. On the other hand, the 
policy may be to have the indexer understand and index the 
document comprehensively; this makes for higher costs. 

(6) The indexer identifies the concepts of knowledge dis- 
cussed in the document. This step may be performed along 
with the analysis step. 

(c) The indexer evaluates for importance the concepts of 
knowledge discussed in the document; he chooses certain 
ideas for indexing and discards others. 

(d) The indexer describes the information content of the 
document; this is the physical process of indexing. In es- 
sence, this is the thinking out of a set of declarative sentences. 
The important ideas in these sentences are used as indexing 
terms. Appropriate role indicators are used to indicate the 
relationships among these terms. Thus each such “sentence” 
constitutes in the index one subdivision of the document—as 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


described earlier in resubdividing the horizontal rows of the 
matrix. 

(e) The indexer then adds other terms, to the extent 
justified, using as source material the thesaurus listings under 
some or all of the terms he has already extracted from the 
document. Assignment of role indicators to these additional, 
redundant terms is routine, being determined largely by the 
role indicators assigned to the originally developed set of 
terms. The cost of input can be controlled to some extent, 
during this step, by making policy decisions concerning the 
extent of use of the thesaurus. 

This is the end of the intellectual work of input. It is also 
the end of the most costly part of the operation, Next, the 
indexing entries (both terms and accession numbers) for a 
number of incoming documents are sorted (clerically or 
mechanically) into the prefiled order and are posted (cleri- 
cally or mechanically) to the index itself—in whatever form 
it may exist, either manual or mechanized. 

Searching techniques, of course, depend largely upon the 
physical form of the index. Manual card or printed indexes 
often depend upon the manual matching—the coordination— 
of lists of item numbers posted on terms. Machines do es- 
sentially the same thing, but can often be programed to make 
complex searches in one rather than in many steps. 

In fact, mechanization of an index 4s merely another tacti- 
cal problem. It is apparent that the degree of mechanization 
will depend upon a number of environmental factors—such 
as the size of the index, the method of grouping of the index, 
the number of questions which have to be answered per unit 
time, how soon a question put to the index must be answered, 
and other factors as well. If the total store is small, if only 
a few questions come into it each year, and if there is no par- 
ticular utgency in answering these questions, obviously a 
manual system—undoubtedly a very simple manual system— 
would be the appropriate choice. Other circumstances, how- 
ever, might necessitate an extreme degree of mechanization. 

At this time, one point must be emphasized. Almost any 
machine or device designed for manipulating data or infor- 
mation can be employed in the storage and retrieval opera- 
tion. They can all be programed or wired to do the same 
job. Naturally, there may be differences in the way that one 

as to arrahge one’s system in order to use a given machine. 
For example, some machines would require that one arrange 
the file in random order. Others might dictate arrangement 
in a prefiled order. Such considerations might have a major 
bearing upon the economics of mechanization. And this is 
the only basis upon which mechanization should be con- 
sidered—that of economics. 

Another tactical question to be resolved is this one: “After 
the searcher has found (in the index) the identification num- 
bers of presumably pertinent documents, what then does he 
do?” It has been the experience of most of us that the 
searcher should at this point have available a set of some 
sort of abstracts. He can look up the abstracts of those docu- 


201 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


ments to which he is referred and decide quickly which ones 
are most important to him, which ones to obtain and to read 
first, which ones to defer action on, and (perhaps) which ones 
to ignore altogether. 

On the other hand, abstracts without an index serve only 
a limited purpose—that of advising the reader of current 
events, which may or may not be of interest to him at the 
moment and which he may not remember at some future time 
when they should be of interest. Further, the scanning of 
large numbers of abstracts is laborious and time consuming ; 
an index to the abstracts permits much higher search 
efliciency. 

In summary, it may be said that the details—the mechan- 
ics—of an intormation storage and retrieval system are not 
too important at the stage of the game at which the Building 
Research Institute finds itself. Rather, basic considera- 
tions—the building of a firm foundation—should be para- 
mount. It is believed that what has been described herein is 
fundamentally correct—insofar as basic considerations go. 
As we proceeded more into tactics, however, there may well 
be some honest disagreements. It must be reiterated, how- 
ever, that the system : described today does provide a solution 
to the storage and retrieval problem. It permits the oper- 
ators of the system to obtain greater system effectiveness— 
better retrieval—if they are willing to pay more for that 
greater effectiveness. lt does not provide perfection—but 
who here has ever felt rich enough to build a bridge, or to 
design a dam, which he can guarantee to be perfect for all 
time? Who among us has ever found that which he is will- 
ing to guarantee w ‘ll be the least costly solution to any prob- 
lem for all time? We can only state a firm belief that an in- 
formation system built upon the principles discussed today 
will be capable of meeting one’s needs to the extent that one 
is willing to pay for that capability and, should more eco- 
nomical techniques be developed in the future, that the sys- 
tem may be converted at small cost. 

The last statement is made based upon mathematical con- 
siderations of the binary rectangular matrix which we dis- 
cussed earlier. These considerations have led, we believe, to 
the beginnings of a mathematical theory of written com- 
munication—a theory which employs the same equations and 
leads to the same end results as those developed by workers in 
other fields. This, we think, attests to the fundamental na- 
ture of the considerations discussed today and indicates that 
while tactics may change because of different environments, 

the strategy of a well-designed system will remain valid. 















] 
Re: 
196 
sys 








DOCUMENTATION OF SCIENTIFIC INFORMATION 203 















LITERATURB CITED 
. Taube, Mortimer, Miller, Eugene and Kreithen, Alexander, Sys- 
tems for Cataloging and Retrieval of Information, as presented at 
Building Research Institute Fall Conference Workshop on Building 
: , November 19, 1959. 
. Cherry, Colin, On Human Communication, John Wiley and Sons, 
New York, 1957, pp. 16, 17. 

3. Bernier, Charles L., and Heumann, Karl F., Correlative Indexes 
ITI. Semantic Relations Among Semantemes—The Technical The- 
saurus, Interscience Publishers, American Documentation, Vol. 
VIII, No. 3, 1957, pp. 211-220. 

4. Luhn, H. P., A Statistical Approach to Mechanized Encoding and 
Searching of Literary Information, IBM Journal of Research and 
Development, Vol. I, No. 4, Oct. 1957, pp. 309-317. 

5. Taube, Mortimer, and Associates, Studies in Coordinate Indeving, 
Volume II, Documentation, Inc., 1954, pp. 72-111. 









bo 









Esso Resgarco & ENGrIneertne Co. 





















Mr. W. T. Knox, director, Technical Information Division, Esso 
Research & Engineering Co., submitted to the staff, on January 25, 


1960, the followi ing summary of the science information retrieval 
sy stem employed by that company: 


I am happy to send you a description of the science infor- 
mation retrieval system employed within Esso Research & 
Engineering Co., as you requested in your January 19 letter. 
You will notice that our system is a pooling of the talents of 
many individuals along with some rather simple mechanized 
techniques. Our reasons for this are, I am sure, obvious to 
you. ‘the use of published scientific and engineering infor- 
mation by a research and engineering organization takes 
place in many ways. Asking tora specific answer to a scien- 
tifie or engineering problem occurs only a part of the time. 
We have to be set up also to allow considerable browsing by 
our scientists and engineers in fields which are not directly 
pertinent to their immediate problem. A mechanized system 
for retrieving specific information on request would obviously 
not satisfy this need. There is the additional point that ex- 
perienced, able Ph. D. scientists and engineers can see unobvyi- 
ous relations in the data they are scanning which a machine 
cannot, because it cannot be instructed economically to keep 
in mind all of the possible uses for information which exist 
in a typical research and engineering company. In sum- 
mary, we look upon this whole division as a living machine 
for accomplishing the most effective and economical acquisi- 
tion, storage, and retrieval of scientific and engineering in- 
formation which we can at this point devise. This view- 
point is also expressed in the attached reprint of an article 
written about the Technical Information Division in 1957. 
This is not to say, as you will notice in the attachment, that 
we are not actively engaged in applying new machine-base« 
techniques wherever we can find ce they will be econom:« 
However, it does little good to replace one searching clerk 
by a machine if it requires two clerks to put the information 
into the machine. We are trying to be as progressive and at 
the same time realistic as possible. 



















204 DOCUMENTATION OF SCIENTIFIC INFORMATION 


A DESCRIPTION OF THE TECHNICAL INFORMATION DIVISION, ESSO 
RESEARCH & ENGINEERING CO., AND SOME OBSERVATIONS ON 
THE INFORMATION PROBLEM 












The Esso Research & Engineering Co. is a major scien- 
tific and engineering affiliate of the Standard Oil Co. (New 
Jersey). Those affiliates of Standard Oil Co. (New Jersey) 
which refine, market, and transport crude oil and petroleum 
and petrochemical products look to Esso Research for the 
development of new processes and products, and for techni- 
cal advice on improving efficiency of their operations and 
the quality of their existing products. Esso Research has 
about 1,200 professionally trained scientists and engineers 
employed at its central facilities. In addition, it has research 
contracts with several other Jersey affiliates, representing 
about 500 more professional employees. 

Since the First World War when American science and 
technology began their astonishing growth, Jersey and its 
affiliates have recognized the important place that science 
and technology would play in the petroleum industry. They 
have always been conscious of the value of published techni- 
cal information. At the time of the formation of the origi- 
nal Esso Research & Engineering organization in 1919, 
an information division was one of its four main divisions. 

For a number of years Esso Research management recog- 
nized that keeping up with the technical literature was 
becoming an increasingly difficult problem for its research 
and engineering staff. Esso Research, therefore, early in 
1957, established a technical information division on a 
broader scope than had existed heretofore. The primary 
function of this division is to obtain, interpret, and dissemi- 
nate to Esso Research & Engineering professional people, 
and to professional people in the affiliates, scientific and engi- 
neering information as it appears in the published literature. 
There are about 60 people in the division. 

Basic to the division, of course, is the technical library. 
Our library has about 40,000 volumes of books and bound 
issues of periodicals, and 60,000 copies of patents from the 
United States and other countries. Abstracts of articles of 
interest are filed on cards arranged by subject and author; 

we now have almost 2 million cards in our files. These files 
are our primary reference sources when we wish to search 
for information, and we use them continually for fast 
answers to telephoned requests as well as for more leisurely 
answers to written requests. 

Our current input is about 600 technical journals, of which 
some 300 are from the United States, 180 from Western 
European countries, 27 in Russian and other Slavic 
languages, 9 in Japanese, and 20 in miscellaneous. We have 

a continuous program of acquiring new books (about 1,500 
a year) and weeding out obsolete books; new patents from 
all sources amount to about 3,000 a year. We aioe get copies 
of about 2,500 preprints issued for many technical meetings. 
These appear many months before they are printed in a jour- 























































DOCUMENTATION OF SCIENTIFIC INFORMATION 


nal, and in some cases, they are never printed in any other 
form. This is a very valuable source of technical infor- 
mation. 

Formal technical reports from the various Esso Research 
and Engineering divisions represent the fruit of our research, 
development, and engineering efforts and as such, they are 
most valuable. We have a special reports unit which indexes, 
stores, and retrieves from storage, technical information 
which we have generated within our own company. We have 
recently developed a new indexing system for these reports 
which employs a new indexing code developed by ourselves 
and a punched card sorting machine. This approach is now 
being studied for application to the published literature. 

For the benefit of the scientific staff of Esso Research and 
Engineering and also professional and managerial groups 
in Jersey affiliates throughout the free world, the technical 
information division publishes several abstracts bulletins, 
copies of which are attached. On the front of the two largest 
ees we place a one-page brief summary, or “Highlights” 
of the contents inside. The various bulletins are the primary 
means by which our professional statis keep up with the 
scientific and engineering literature. 

We rely on our own abstracting efforts, but supplement 
these with outside abstracting services. One of the best of 
these, in our experience, is the American Petroleum Institute’s 
Central Abstracting Service. We have found this very help- 
ful. We are quite interested in their current studies to 
broaden the scope of the service and possibly provide a co- 
operative indexing and information searching center, using 
machine storage and retrieval for the petroleum and petro- 
chemical industry. 

We also supplement our own abstracting efforts with ab- 
stracts from the well-known Chemical Abstracts Service. 

The net result of our abstracting effort is that technical 
staffs of the Jersey company and its affiliates get in these 
abstracts bulletins the most important items as they appear 
in periodicals, patents, and preprints of technical meetings. 
About 25,000 abstracts are published each year. This is a 
large number, but it is only a fraction of the total number 
of items being published in our fields of interest. 

Keeping this vast store of information in a readily acces- 
sible form is a difficult one at this point, and gives promise 
of being even more difficult before we reach a better solution. 
We have asmall group within the technical information divi- 
sion whose job is to look for techniques which will make the 
storage and retrieval job much simpler. We have emphasized 
applying rather simple machine-based techniques where they 
can be economically justified. There are many such tech- 
niques and devices available, but most of them require high 
input costs in order to get rapid searching. We are now 
looking at several promising devices which are useful in a 
number of ways. 


54122—60——_14 


205 



























































DOCUMENTATION OF 





SCIENTIFIC INFORMATION 





Actually our main reliance for information awareness and 
retrieval is placed on a group of scientists and engineers who 
are devoting their full time to keeping up with, analyzing and 
interpreting, and calling others’ attention to developments i in 

technical fields of interest to Esso. We call this information 
research, and it is described rather completely 3 In a paper pre- 

sented at the Fifth World Petroleum Congress in New York, 
and attached. 

Having this group of experienced professional men en- 
gaged full time in information work is quite a break from the 
past and with tradition. The technical men in this work are 
fully professional and are as competent as any technical men 
that we have in the laboratory or engineering divisions. 

It has traditionally, of course, been a part of each technical 
man’s responsibility to do his best to keep up with the litera- 
ture. With the literature now of such formidable propor- 
tions, it is no longer possible for any one man to do this, 
except in a very narrow field. Our technical men engaged 
full time in what we call information rese arch, as opposed to 
laboratory research, know the company’s technical program 
and the people w ho are working on the various aspects of it. 
They make sure that items of pertinent interest to various 
individuals in the laboratory, or development, or engineering 
divisions are called very promptly to their attention so that 
they will not be overlooked. ‘They also look for new ideas, 
new possibilities, which they then bring to the attention of 
the responsible member of management for possible inclusion 
in Esso Research’s program, ‘These men examine every 
abstract which we obtain through our own efforts and through 
subscription to the services I mentioned earlier. 

From these thousands of abstracts, they select a very 
limited number which they consider to be genuine highlights 
of current technical work. In addition to veing attached to 
the weekly abstracts bulletins, the single page “Highlights” 
are distributed separately to those people whose other respon- 
sibilities are so heavy that they do not have time to look at 
the complete bulletins. These separate “Highlights” pages, 
for example, go to almost 300 professional people, including 
members of management. 

I should now like to make some personal observations 
about this problem of the mushrooming growth of the scien- 
tific literature, based on our experience in Esso Research 
and Engineering. 

First, the manner of using the published literature varies 
greatly from one technically trained man to another. What 
one man wants the next man does not want. It is, there- 
fore, very difficult to generalize about patterns of use of 
scientific information or to say that our scientists and en- 
gineers want this kind of scientific information or they want 
it presented in such a manner. 

The use or the manner of using the published literature ap- 
pears to correlate most closely with what a man is used to, 
and perhaps also, to the nature of his job. The staff at 


DOCUMENTATION OF SCIENTIFIC INFORMATION 207 


Esso Research and Engineering have had abstracts bulletins 
available to them for many years. The abstracts bulletins 
are merely Esso Research’s efforts to make sure that he is cur- 
rently aware of the scientific and technical literature perti- 
nent to his work. We hope that by making it more available, 
it will encourage him to use it. 

I also believe that considerably more could be done in most 
colleges and universities to train science and engineering 
graduates in the most efficient use of the technical literature. 
Few graduates appear to have more than a casual acquaint- 
ance with technical information sources and searching pro- 
cedures. 

Secondly, our experience at Esso Research and Engineer- 
ing with information research work leads us to believe that 
there is a wealth of new ideas and new applications of exist- 
ing ideas which can be made available with existing tech- 
niques to those who study the literature. The main problem 
is putting high quality scientists and engineers to work in 
this field. We feel that it is very important to recognize that 
the United States has no shortage of information—but a real 
shortage of people talented and trained and motivated to 
make use of it. Our Esso experience with first-rate scien- 
tists and engineers engaged full time in information research 
work has been very good. Without an elaborate information 
processing system employing electronic computers and simi- 
lar devices, we have found it quite feasible to use the pub- 
lished literature in our field and to use it very effectively. 
Other industrial research and development firms have had a 
similar experience. In some areas of science and technology 
where the use of published literature has not been practiced 
as extensively as in the chemicals and petroleum fields, there 
seems to be greater haste to use mechanized systems. This 
may be the result of inadequate experience in industrial 
research and development and resultant unfamiliarity with 


the best techniques for making effective use of the published 
information. 


After reviewing the original draft of the report prepared by the 
staff, for revisions and suggested changes, Mr. Knox submitted the 


following additional comments relative to certain sections of the 
report: 


The report, in general, appears to be an excellent sum- 
mary of a very complex subject, and you and your staff are 
to be commended for its preparation. * * * 

Your criticism of the administrative “need-to-know” 
regulations used by some Federal agencies to withhold un- 
classified Government research results from certain indus- 
trial research organizations is well taken. Surely the 
national security rests ultimately on the health and efficiency 
of all parts of the national research effort, and not just on 
those parts which have contracts with Federal : agencies. We 


hope your pointing out this hampering restriction will speed 
its abolishment. 











208 








































































































































































DOCUMENTATION OF SCIENTIFIC INFORMATION 


We also support your conclusion that an all-embracing 
Federal information center in science and technology is not 
now feasible. However, we share your concern that some 
Federal agencies are not pushing more vigorously better in- 
formation systems within their operations. This problem 
needs continued effort on systems research and development 
merely to keep abreast; a sense of urgency is needed to make 
progress. We hope your highlighting this problem area 
will act as a spur where it is applicable. 

It is difficult to follow the reasoning set forth in the re- 
port draft to the effect that because the National Science 
Foundation is not an operating agency, there is serious doubt 
whether the NSF is now providing or will provide the neces- 
sary leadership and coordination necessary in this field. It 
seems to me that, unless you wish the Federal Government to 
take responsibility for instructing people in the use of sci- 
entific information and supervising its use, the Federal Gov- 
ernment’s role should be simply to take the necessary steps 
to insure that this information is known and available to the 
Nation’s scientists and engineers. It is fallacious to design 
and build a system to force scientific information on its users; 
“vou can lead a horse to water, but you can’t make him 
drink.” The proper purpose of a scientific information sys- 
tem is to provide the information when it is needed in the 
judgment of the scientist or engineer. I believe this has been 
the guiding policy in ASTIA’s operations; it has also been 
the experience of all industrial research groups with which 
Iam familiar. 

If the above argument is valid, then it appears to me that 
the performance of the NSF in this field should be measured 
against whether scientific and engineering information is 
now more available than formerly, and whether its plans for 
future progress appear adequate, timely, and efficient. 

Since scientific information is generated from almost 
countless sources, the NSF in 1958 was faced with a basic 
policy decision—whether to endeavor to collect, process, and 
disseminate all this information itself (for which I believe 
it had statutory provision), or to encourage those generating 
this information to make it better known and more avail- 
able to those needing it. The latter course, if it works, is un- 
doubtedly more efficient, both in dollars and in utilization 
of the Nation’s limited resources of scientists. It appears to 
me that the primary determinants affecting the success or 
failure of the NSF’s present scientific information program 
are (1) the responsibilities and authority given to NSF vis- 
a-vis other Federal agencies, and (2) the leadership of the 
scientific information program within NSF; the determinant 
is not whether NSF is an “operating” agency in the same 
sense as, say, the Department of Commerce. 

* * * My concern, and I am sure it is equally yours, is for 
the maximum efficient rate of progress in making the tremen- 


dous amount of scientific information known and available: 


to the Nation’s scientists and engineers. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 209 


GENERAL Etecrric Co. 


In response to the staff’s request relative to science information 
systems in use or being developed by the General Electric Co., Mr. 
Clair C. Lasher, general manager, computer department, wrote the 
staff on February 29, 1960, as follows: 


Mr. Robert Paxton, president of the General Electric 
Co., has asked me to reply to your inquiry of January 
19, 1960, about science information systems that are being 
developed by industry and the Federal Government. 

We view the development of science information system as 
a vital interest of the Nation and the company. Except for 
large military systems, we believe that they are the most 
rapidly developing phase of electronic information systems in 
general, which are required by all disciplines for all fields of 
knowledge and for all practical activities which create, re- 
quire, and use information. 

Mr. Bernard K. Dennis, manager of General Electric’s 
Technical Information Center, Flight Propulsion Division, 
Building 305, Cincinnati, Ohio, is using an IBM 704 com- 
puter with magnetic tape input to search some 50,000 docu- 
ments, the majority of which come from ASTIA. A man- 
ually operated coordinate index was entered onto tape for 
this installation. It requires about 4 minutes to make a 
search of 99 questions in 1 batch; it takes somewhat longer 
to print out the results, depending on how many entries were 
found and how much text the questioners require for the en- 
tries. The program for this search is available to any organ- 
ization with similar equipment available, and a 13-minute 
sound 16-millimeter color movie is available on loan to de- 
scribe the service. 

The Computer Department of General Electric has worked 
on several projects, any one of which could be used for the 
control and retrieval of scientific information, but which 
taken together imply our capability in handling information 
of all kinds. 

From February 1958 to September 1959, the computer de- 
partment carried through two phases-of a five-phase project 
called 438-L, for AFCIN. This project is designed to facili- 
tate the collection, evaluation, retrieval, use, and dissemina- 
tion of intelligence with the aid of high-speed computers and 
other electronic auxiliary and peripheral equipment. The 
work was transferred to the information systems section of 
the defense systems department in September 1959. The 
Air Force is holding project 438-L in a quiescent stage 


now. 
The computer department has been working for the past 3 
years with Western Reserve University’s Center for u- 


mentation and Communication Research, designing and 
developing equipment which is known as the GE-250 tran- 
sistorized information searching selector. We have con- 
tracted to deliver one of these units to Western Reserve this 
year, and a second selector to the well-known biographical 





210 


DOCUMENTATION OF SCIENTIFIC. INFORMATION 


dictionary for scientific and technical ,personnel in this 
couritry, “American Men of Science,” now going into its 
10th edition. 

The GE-250 at Western Reserve University.will be used 
in the Metals Documentation Service of the American So- 
ciety for Metals. It will be used by Western Reserve under 
a contract from the National Science Foundation as noted 
in your staff memorandum. The conduct of the test of the 
Western Reserve University information system using our 
GE-250 will be monitored by an ad hoc committee appointed 
by the National Academy of Sciences-National Peaaeainen 
Council for the specific purpose. 

The GE-250 is designed to accommodate all of the intel- 
lectual systems devised for handling information to date; 
such as classification, subject headings, coordinate index- 
ing, telegraphic abstracting, etc.; it is not limited to the 
method sponsored by Western Reserve University. 

The computer department has conducted a brief survey 
of the Research Information Service of the Division of Re- 
search Grants of the National Institutes of Health, and con- 
cluded that the manual retrieval system now in use should 
be converted to electronic equipment in the near future to 
protect the present investment and assure improved service 
to the vital and rapidly expanding research programs of 
NIH. The survey included a visit to the Biosciences In- 
formation Exchange of the Smithsonian Institution and con- 
sideration of its relationship to similar NIH activities. 

The computer department presented a proposal for an 
electronic information system for the Library of Congress 
on October 1, 1959 (CDG-3878). This proposal is being re- 
viewed by the Committee on Mechanized Information Re- 
trieval at the Library of Congress in competition with pro- 
posals from the Ramo-Wooldridge Corp. and the IBM 
Corp. 

The proposal to the Library of Congress (LC) is for a 1- 
year effort on a broad scale to study, design and simulate the 
essentials of an electronic information system in order to 
demonstrate the feasibility of the systems approach to LC’s 
problem. Several alternatives are also proposed for the early 
mechanization of certain LO operations which readily lend 
themselves to hardware implementation at this time, using 
equipment which represents the initial building blocks of an 
evolutionary framework for a complete electronic informa- 
system. For example, the GE-250 seems to be particularly 
suited for the quick retrieval requirements of the Legisla- 
tive Reference Service, which serves the Congress and its 
staffs directly for LC. 

We are convinced by our survey and subsequent studies and 
analyses that an electronic information system can be de- 
signed, equipped, and installed and operated in such a way 
as to accomplish substantial improvement in the performance 
of the Library’s present basic functions. We infer from re- 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


lated experience with other systems that the potentialities 
are so great that the Library should be able to undertake new 
and as yet undefined functions in serving the Congress and 
the Nation. 

The history of system design and computer technology is 
one of rapid increase in capabilities wherever the needs can 
be defined clearly. Systems design and computer technology 
already offer such potentialities that we believe the Library 
of Congress should adopt a broad systems approach, under- 
take design studies to develop this systems approach, and 
then gradually install equipment which will be incorporated 
unit by unit into the electronic information system. 

The preceding paragraphs tell why we cannot submit a 
brief outline of our system as you have requested. We are 
operating part of a system with the aid of a general purpose 
computer, we are manufacturing a special purpose computer 
to perform information retrieval, and we are proposing the 
study, design, development, and installation of complete in- 
formation systems, but so far as we know, there are no com- 
plete systems in existence anywhere. 

We have developed information and opinions on the merits 
and potentialities of information systems in Government and 
other industries. Several of the Government systems are or 
have been classified under security regulations. Some of our 
employees have been properly cognizant of these systems, but 
you are aware that we may not consider them in unclassified 
correspondence and we know that you will call upon us after 
making appropriate arrangements. 

The computer department has two employees in the Wash- 
ington area who are familiar with the field of scientific docu- 
mentation and with the work of the company. They are 
certainly available for consultation at your convenience. One 
is Mr. Hans C. Ullmann, who was an employee of the Armed 
Services Technical Information Agency from 1947 until 
February 1959; he made the survey of the activities of the 
Research Information Service in the Division of Research 
Grants, National Institutes of Health. The other is Mr. 
C. D. Gull, who worked 7 years in the Library of Congress, 
part of it in the predecessor to ASTTA there, worked 21% 
years for Documentation, Inc., and 3% years for the National 
Academy of Sciences-National Research Council. He is 
currently president of the American Documentation Insti- 
tute, the only professional society in this country which is 
actively interested in this problem, and served 1958-60 as 
Chairman of the Committee on Dissemination of Technologi- 
cal Information About Materials and Materials Research of 
the Materials Advisory Board of the National Academy of 
Sciences-National Research Council. Mr. Gull also carried 
through the preparation of the proposal to the Library of 
Congress. 


211 





212 DOCUMENTATION OF SCIENTIFIC INFORMATION 


BIBLIOGRAPHY 


“Automatic Information Retrieval,” 13-minute film (16 mm. in sound 
and color) available on loan upon request to B. K. Dennis, General 


Electric Co., FPD Technical Information Center, Building 305, Cin- 
cinnati, Ohio. 


Barton, A. R., Kaplan, L.N., and Schatz, V.L. “Information Retrieval on 
a High Speed Computer,’ Western Joint Computer Conference. Pro- 
ceedings : 77-80, March 3-5, 1959. 

General Electric Company Computer Department. The General Elec- 
tric GE-250 information searching selector. January 1960. Sales 
Brochure CPB-—57A. 


Gull, C.D., and Dodge, P.O. The Transistorized Information Searching 
Selector. Phoenix, Arizona, 1959. 33p. (CPB-82). Prepared for 
International Conference for Standards on a Common Language for 
Machine Searching and Translation at Western Reserve University, 
September 6-12, 1959 ; to be published in its Proceedings. 

Following consultations between representatives of the General 
Electric Co. and members of the committee staff, Mr. B. K. Dennis, 
manager, technical information division, further advised on March 
16, 1960, that GE was submitting, in cooperation with Western Re- 
serve University and the American Society for Metals, a proposal to 
the National Science Foundation for a study of problems associated 
with operating mechanized science information systems in research 
and development organizations. Mr. Dennis’ letter and a draft of the 
proposal follow : 


In his response to your January 19, 1960, inquiry to Mr. 
Robert Paxton, president of the General Electric Co., about 
science information systems that are being developed by in- 
dustry and the Federal Government, Mr. Clair C. Lasher, 


general manager of General Electric’s computer department 
briefly described work being done here in Cincinnati in the 
flight propulsion division. To supplement the information 
provided by Mr. Lasher, I have enclosed the preliminary 
draft of a proposal we are now preparing for consideration 
by the National Science Foundation. 

The proposed study would focus the combined know-how 
of General Electric and Western Reserve University on the 
identification and solution of problems associated with oper- 
ating mechanized science information systems in research 
and development organizations. Operating experience data 
is much needed by designers and potential users of these 
systems. 

General Electric’s Technical Information Center is already 
an experienced operator of a large, high-speed mechanized 
information storage and retrieval system. Western Reserve 
University’s Center for Documentation and Communication 
Research is an internationally recognized leader in the field of 
mechanized information handling. We propose that a joint, 
General Electric-Western Reserve University effort could go 
a long way toward providing Government and industry with 
concrete, science information system operating data hereto- 
fore not possible to obtain. 

I would like to reaffirm Mr. Lasher’s offer of our movie 
“Automatic Information Retrieval.” The film has been bor- 
rowed by about 70 organizations in both Government and in- 





DOCUMENTATION OF SCIENTIFIC INFORMATION 213 


dustry during the past year. Viewers of the movie have 
found it enlightening and thought stimulating. We would 
be pleased to lend the film to you at your convenience. Also, 
should you wish to discuss our retrieval system or our tech- 
nical information center, I will be glad to meet with you. 


[Preliminary draft] 
MACHINE LITERATURE SEARCHING IN A SPECIAL LIBRARY * 
I, SUMMARY 


Attempting to provide an effective special library:program 
in support of its scientific and technical activities, General 
Electric’s Flight Propulsion Division has devoted consid- 
erable time, energy, and funds to developing and operating a 
mechanized information center. General Electric is now in- 
terested in conducting an experiment aimed at accomplishing 
three main objectives : 

(1) To evaluate the effectiveness of its literature pro- 
gram and compare it with the more detailed system de- 
veloped for the American Society for Metals by Western 
Reserve University ; 

(2) Through this evaluation and comparison, to im- 
prove both systems and work toward a standardization 
of input practices that will be of value to all who will use 
such systems ; 

(3) To investigate a new and direct method for meas- 
uring, as quantitatively as possible (dollars, time, per- 
cent), the value of information services that can be pro- 
vided to research and development personnel. 

A 2-year program is proposed for evaluating two types of 
services within a special library—the Uniterm system and the 
WRU method. Every effort will be made to identify and 
control variables and to make comparisons on as scientific 
a basis as the work will permit. 


II. INTRODUCTION 


Over the past several years, General Electric and other in- 
dustrial organizations have been attempting, with varying de- 
grees of success, to cope with the problem of obtaining in- 
formation to support decisionmaking in the face of an 
increasing amount and complexity of published literature. 

From the quantity and complexity of information that is of 
importance, it has become obvious that decisionmakers—in 
research as well as in other areas—cannot read and remember 
all the literature that may be of even immediate—let alone 
potential—use. In developing a special library to serve the 
needs of persons within the organization, completeness of 


8 Submitted to National Science Foundation peoseengees, D.C. by Technical Information 
0., 


Center, Flight Propulsion Division, General Electric n cooperation with Center for 
Documentation and Communication Research, Western Reserve University, and American 
Society for Metals. 


















DOCUMENTATION 





OF SCIENTIFIC INFORMATION 


coverage has often been sacrificed because of the tremendous 
number of publications which must be scanned and their con- 
tents analyzed for retrieval. Despite the usual budget lim- 
itation for this type of work, a considerable amount of money 
is being spent on literature processing throughout industry. 

General Electric’s Technical Information Center has car- 
ried the processing of literature forward in spite of the 
relentlessly increasing volume of material and the inade- 
quacy of available abstracting and indexing services. It now 
covers most of the subject matter of interest to General Elec- 
tric’s Flight Propulsion Division to a certain degree, but 
incomplete content indication and material coverage are in- 
dicating the virtual necessity of increasing economy and ef- 
fectiveness through cooperation with centralized information 
processing agencies and through mechanization. 

In continuing the development of mechanized searching 
systems, it behooves General Electric and all research and 
development operations to be certain that they are making ef- 
fective use of the best available tools and techniques in the 
field of documentation. They should also be assured that the 
optimum standardization is in effect so that interchange be- 
tween libraries will be complete and efficient. The early work 
in this field should be so guided that the sound foundation al- 
ready laid will be maintained and strengthened for future 
work. Also, maximum use should be made of work already 
done. 

Accordingly, a cooperative project is. proposed which 
would involve three organizations: (1) The General Electric 
Technical Information Center, at Evendale, Ohio, (2) the 
Western Reserve University Center for Documentation and 
Communication Research, and (3) the American Society 
for Metals-Metals Documentation Service. 


Itt. 





PROPOSED EXPERIMENTAL PROGRAM 


Over the past several years the Technical Information Cen- 
ter of General Electric’s Flight Propulsion Division has 
maintained a document-processing activity leading to the 
machine searching of literature in various subjects of science 
and engineering. One of the subject areas of extreme in- 
terest to this facility is metallurgy and allied fields. 

Questions are being asked at an increasing rate by research 
personnel of the General Electric Co. and mechanized 
searches are being performed on an IBM 704.. Copies of 
abstracts which meet the search criteria are printed and 
presented to questioners. 

The system used (the Uniterm system of coordinate index- 
ing), the searching equipment, and the programs involved 
have been described in earlier reports (app. A) and in a 
movie. 

One of the most pressing problems in operating a mech- 
anized information center is the acquisition of processed and 
encoded material ready for searching. The cost of analysis 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


of the documents in the various fields of interest to the Gen- 
eral Electric Co. represents a considerable investment. If 
processed information, ready for mechanized search, can be 
acquired promptly on a cost-sharing basis, a great saving will 
be realized. 

On January 1, 1960, the American Society for Metals an- 
nounced the start of the Metals Documentation Service (see 
app. B) which offers two major services: The provision of 
current awareness searches through the facilities of the Cen- 
ter for Documentation and Communication Research at 
Western Reserve University and also the provision of en- 
coded material on a suitable storage medium so that other 
organizations can conduct their searching operations at their 
own facility. 

During discussions with representatives of the American 
Society for Metals and Western Reserve University, it be- 
came evident that organizations such as General Electric 
would be much interested in making use of the second alterna- 
tive mentioned above. Such interchange of data would pave 
the way for comparison of the two existing systems, call 
attention to the strengths and weaknesses of both, and tend 
to produce a coordinated approach of value to all who are 
interested in machine searching. . Arrangements, except. for 
funding, have been initiated for General Electric to obtain 
the material necessary to carry out such a program. 

Several fortunate circumstances appear to be converging 
to permit the ready initiation of a project in this field: (a) 
General Electric is to obtain this year an IBM 7090 which 
will search rapidly both ASM encoded abstracts, and Uni- 
terms; (6) a project is being conducted by the University 
of Arizona to prepare a program for IBM 7v9 (or 7090, 
which is the same relative to this program) to simulate the 
GE 250 search as it will be conducted at the Center for 
Documentation and Communication Research when the 
equipment is installed. 

It is of great interest not only to the General Electric Co., 
but also to:the Center for Documentation and Communica- 
tion Research and the American Society for Metals to cooper- 
ate in a program for comparing and testing the relative effec- 
tiveness of various information retrieval systems. There- 
fore, a five-phase experimental program is herewith proposed. 


Phase l: Acquisition of encoded abstract file 


Arrangements will be made for the American Society of 
Metals to provide a copy, on punched cards, punched tape, or 
magnetic tape, of its complete encoded abstract file. In ad- 
dition, conventional abstracts will be furnished on micro- 
film to serve as readable output of machine searches to be 
conducted during this experiment. 


Phase IT: Analysis of the two systems 


This phase will involve a cooperative study (between the 
General Electric Co., the Center for Documentation and 
Communication Research at Western Reserve University, 


215 








216 
























































































































































DOCUMENTATION OF SCIENTIFIC INFORMATION 


and the American Society for Metals) of the similarities and 
differences between the two methods of analysis for machine 
searching—the Uniterm system and the system involving the 
encoding of telegraphic abstracts. The exact means of con- 
verting ASM telegraphic abstracts into the same form as the 
present mechanized system of General Electric’s Technical 
Information Center is to be determined. This translation 
will constitute a sizable portion of the project and will throw 
much light on the extent to which common ground may exist 
between the two systems. A depth of analysis comparison 
as related to costs will be included. This study will involve 
discussions between the interested groups and the prepara- 
tion of a phase report. 
Phase III: Programing of the computer 

The computer, either the presently available IBM 704 or 
the soon-to-be-acquired IBM 7090, will be programed to con- 
duct searches: (@) on the Uniterm basis using the ASM en- 
coded abstract material; (this is done to provide a maximum 
base and reduce the number of variables to a minimum) ; (6) 
on a basis to similate the WRU search by the GE 250 (the 
development of this program has already been undertaken 
by the Numerical Analysis Laboratory of the University of 
Arizona for a program for Fort Huachuca, Ariz. It is an- 
ticipated that this program can be acquired for purposes of 
this study through SHARE and will be adequate without 
revision. 


Phase IV : Conducting of searches 


Approximately 200 searches will be conducted over a period 
of 2 years using each of the methods—the encoded abstract 
and that based on Uniterms derived from these abstracts— 
for two types of problems: 

(a) In response to actual questions which scientists and 
engineers need to have answered for research in which they 
are currently engaged. 

(6) In response to problems where reports are issued so 
the value of the search can be determined in relation 
to experimental research. This will provide the basis for 
investigating a direct method for determining the value of 
information services. Completed projects will be selected and 
literature searches conducted to compare stated results with 
previously documented work to the extent that it is included 
in the ASM encoded material. 

The results of the searches will be made available to scien- 
tists and engineers submitting problems, and studies made of 
the satisfaction resulting from the two methods used for 
conducting the search. In order to isolate the system factors 
in this comparison, the form of the response (abstracts sup- 
plied) to the questioner will be the same, the only difference 
being the material that has been selected in response to the 
question asked. 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


Phase V: Evaluation and report 


The final phase of this program will involve reduction of 
the data obtained from the searches and from discussions of 
the service with scientists and engineers who participate in 
this program. It will be highly desirable for members of 
the National Academy of Science Ad Hoc Documentation 
Committee to submit questions so they can have personal 
knowledge of and participate in this evaluation. A compari- 
son of the results of the searches using the two systems and 
an estimate made of the effectiveness of each system in 
meeting the criteria of search will be made. Also, an evalua- 
tion will be made of searches related to completed projects 
as to the amount of money saved or other benefits gained by 
conducting searches—as compared to the actual results in 
the formal report from the program. 


IV. RESULTANT VALUES 


(1) The General Electric Co., Western Reserve Univer- 
sity, the American Society for Metals, the National Science 
Foundation, and others who are interested will have a com- 
parison and evaluation of the two systems which can be used 
to guide them in the advantageous exploitation of these and 
similar systems. 

(2) The comparison and use of the two systems will consti- 
tute a major step in “debugging,” improving, and unifying 
these systems. 

(3) This work will provide valuable information as to the 
factors to be considered in establishing procedure for inter- 
convertability among abstracts and symbols for information 
retrieval such that material, once abstracted and coded, can 
be readily exchanged between these and other information 
handling systems which may be developed. 

(4) This work should give rise to new methods and tech- 
niques which will improve these systems and/or open the way 
to establish new and better approaches. 

(5) Conducting this study will make possible a corollary 
benefit—investigating a new and direct method for measur- 
ing the impact of an effective information program on the 
technical capabilities of a research and development organ- 
ization. 

V. PROPOSED WORK SCHEDULE 


(1) An assurance of favorable constideration of this pro- 
posal, the telegraphic code dictionary will be obtained and 
development of a system for machine translation from WRU 
coding to Uniterms will be initiated. 

(2) Final arrangements for transferring a copy. of the 
complete set of ASM abstracts and coded tape to GE, Even- 
dale, will be made. 

(3) The complete ASM file will be mechanically converted 
for Uniterm searching. 

(4) All articles which have been prepared for searching by 
both systems will be identified and used to evaluate the com- 


217 





218 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


pleteness of content indication provided by the methods as 
now used. This will also provide a check of the reliability of 
machine translation from WRU _ encoded abstracts to 
Uniterms. 

(5) Questions will be solicited from normal research and 
development channels for the actual testing of the systems. 

(6) After the files are prepared for searching on IBM 
equipment at Evendale (probably about the start of the sec- 
ond year) the following types of searches will be con- 
ducted : 

(a) GE Uniterm system as now used versus WRU tele- 
graphic abstracts converted to Uniterms on a sufficient sample 
to compare effects of varying depths of analysis. 

(6) WRU encoded telegraphic abstracts versus telegraphic 
abstracts converted to Uniterms—both searched by GE per- 
sonnel to compare the systems as operated by one group of 
people. 

(¢) Sample of searches conducted under (6) versus WRU- 
SS system with questions analyzed by WRU personnel to 
determine whether other people can use the WRU-SS sys- 
tem satisfactorily. (All of the above work (a), (6), and (c) 
will be done on the IBM 7090.) 

(d) WRU-SS system and arrangement with WRU 
searchers—simulated program on IBM 7090 versus GE 250 
on sufficient sample to establish validity of simulated pro- 
gram. 

(7) Each questioner will be asked to evaluate the material 
supplied to him from the various searches on a standard form 
for accurate compilation and optimum evaluation. The Na- 
tional Academy of Sciences ad hoc Documentation Com- 
mittee will be asked to assist in developing this form to 
assure its validity. 

(8) Questioners’ evaluations will be compiled and sum- 
marized. 

(9) Data will be analyzed to determine correlation be- 
tween depth of analysis and search effectiveness. 

(10) Suggestions for standardization, improvements on 
the two systems, and new system concepts will be summarized 
and evaluated. 

(11) Monthly contacts (letters, phone calls, or visits) will 
be maintained with National Science Foundation and Na- 
tional Academy of Sciences ad hoc Documentation Com- 
mittee to report on progress. A final report will be issued 
upon completion of the study. 


VI. FACILITIES AND PERSONNEL 


Details of facilities and qualifications of contributing per- 
sonnel will be included in the final draft, of this proposal. 















DOCUMENTATION OF SCIENTIFIC INFORMATION 219 


VII. BUDGET 





It is estimated that this program can be completed during 
a period of 2 years, in cooperation with Western Reserve 
University and the American Society for Metals. Total 
program expenditure would be less than $160,000 for the 2- 
year period with costs distributed approximately as follows: 


Project cost estimate 











General Electric Co.: 





| 

| 

ee 
PP ROROCR: POI sid 5 55 dis ee Sa bet th de bs ea a nos $12, 000 $12, 000 
FNS Wiccan vin scnestipige nonenendeienitiasdenieienininainiadinn 7, 000 5, 000 
Engineering eesietemes. i Ia eae 6, 000 6, 000 
I a ac cite alii iti ania tall rina ceil erate 3, 000 2, 000 
5 PO ee We IIIs oho ahe kn erence lect break Ses Eh 2, 000 2, 000 
Machine programing and operation !..........:...----..---.----.. 25, 000 15, 000 
‘Pubel Giriet onthe ell, i eo ae Doe ie 55, 000 42, 000 
Ra UE: GG Fe I oo tien hs tink she eaeicnnanennebiehinntnd 11, 000 8, 400 
General Electric total cost per year................--...------.-- 66, 000 50, 400 

Western Reserve Sey cf 

Part-time specialist and supporting staff_....................--.--. 9, 000 9, 000 
"TEOve Gila BROOM 62s se A HI LN Sade 1, 500 1, 500 
ee CE Wee need seedeeal neuen a eee 10, 500 10, 500 
Indirect cost (at 20 percent) __........-----..---.-----22.----+--.-e 2, 100 2, 100 
Western Reserve University total cost per year._......_...-. gee 12, 600 12, 600 


American Society for Metals: Materials and services 






REE CON BEE BIE «.nin<icctnineneaeiheatbveedantnanwaneaiae teenie 
Total cost of projett..... wie ee. 





1 Machine programing cost estimate is based on assumption that the University of Arizona 
— ant the GE 250 on the IBM 709 is available without cost and is satisfactory 
or this study. 


















INTERNATIONAL Business Macuines Corporation (IBM) 


On February 19, 1960, a conference was held between members of 
the staff of the Committee on Government Operations and represent- 
atives of IBM, in Washington. In attendance at the conference 
were the following IBM officials: 


Mr. J. D. Aron, Manager, Applied Science, Federal Sys- 
tems Division, Washington, xe 

Mr. R. K. Jurgen, Manager, Scientific Information, New 
York, N.Y. 

Dr. G. W. King, Manager, Lexical Processing, Research 
Center, Yorktown Heights, N.Y. 

Mr. A. Lett, System Design Engineer, Informer-Fieldata 
Contract, Federal Systems Division, Kingston, N.Y. 

Dr. G. W. Petrie III, Manager, Information Retrieva] Pro- 
gram, Advanced Systems Development Division, White 
Plains, N.Y. 

Mr. A. V. Tauber, Director, Marketing Services, Federal 
Systems Division, Washington, D.C. 


Pursuant to the request of the staff, a report based upon this con- 
ference, summarizing the information retrieval activities at IBM 









220 DOCUMENTATION OF SCIENTIFIC INFORMATION 


as outlined in detail at the February 19 meeting, was transmitted to 
the comimttee on March 11,1960. The report follows: 


INFORMATION RETRIEVAL ACTIVITIES AT IBM 


Data processing division 

The Information Retrieval Department of the data proc- 
essing Division is presently marketing its standard data 
processing equipment—the IBM 101 electronic statistical 
sorter, the IBM Collator, the IBM 305 RAMAC, the IBM 
650, the IBM 1401, and the IBM 700-7000 series of equip- 
ment. Every level of equipment from the simple sorter to 
the largest scientific computer has been implemented by IBM 
users in searching their files for information. 

Users of this equipment are usually concerned with three 
categories of information: document reference information, 
chemical compound information, and historical and/or real- 
time data. 

The search of document reference information encompasses 
such printed matter as technical reports, Government pub- 
lications, technical journal articles, and scientific papers. 
The document itself is not stored physically. Only informa- 
tion leading to it or possibly an abstract of it is stored. 
Punched cards, magnetic tape and magnetic disks serve as 
the storage media which are searched by the data processin 
equipment. For document reference information, almost all 
types of IBM equipment are being used. 

The search of chemical compound information gives refer- 
ence to the compound itself including a full description of 
everything required by the searcher. All ranges of IBM 
equipment are again used. 

Search of historical and/or real-time data gives rise to 
selecting information from tape or disk files and arranging 
selected information in special report form. These data 
usually take the form of measured, observed or operational 
data and this kind of search plus the added feature of report 
generation usually requires the capabilities of large-scale 
computers with random-access memories. 

In addition to marketing equipment for information-re- 
trieval applications, the Data-Processing Division holds edu- 
cational seminars, gives machine demonstrations, and dis- 
trieval applications, the data-processing division holds educa- 
tional seminars, gives machine demonstrations, and dis- 
tributes pertinent literature and case-study reports. This 
division is also offering training in the techniques of informa- 
tion retrieval to 350 Applied Science Field Representatives so 
that, ultimately, this group will provide systems assistance. 

As new systems and devices are developed. and perfected 
by the Advanced Systems Development Division and Re- 
search groups, they will be marketed through the Data 
Processing and Federal Systems Divisions of IBM. 


Advanced Systems Development Division 


The Advanced Systems Development Division is responsi- 
ble for the exploration of frontiers of Information Retrieval 


athe ie pa te ee en ee 6 et 


~ 4 goed te 


Pe Oe et ke lee lO 


a = ee ee ek ie OOS 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


applications which appear to be beyond the scope of mecha- 
nization with currently available products. One item of 
concern involves the concept of an index machine with a very 
large capacity memory to make it possible to contain a com- 
plete file of index records so that they can be mec hanically 
searched to ascertain responses to particular inquiries. It 
is envisioned that with the current state of electronic arts, 
it is possible to prepare such an mdex machine so that large 
files of information, such as those contained in certain in- 
telligence agencies and military departments, in addition to 
large files of scientific information, can be made accessible 
to answer specific inquiries in a matter of seconds. 

This division is also addressing its efforts to photo docu- 
ment storage techniques whereby a large field of reports can 
be photographically reduced, page by page, to small images 
which are mechanically retrievable upon demand and copies 
of these document pages made available to the requestor. 

The Advanced Systems Development Division is also con- 
cerned with a further study in regard to the definition of 

roblem areas in information retrieval. This study should 
hive an important impact on the way analysts, scientists, 
and other research workers make known their inquiries and 
receive pertinent information in regard to their own partic- 
ular fields. The division has studied in detail the require- 
ments of one of the agencies in the intelligence community 
and has developed experts in the field who are capable of 
investigating many of the difficult problems concerning ade- 
quate dissemination of scientific information in Government, 
business, and industry. 


Research 


IBM Research is currently engaged in studying the re- 
quirements for the automatic processing of natural languages 
as they appear in running text. There are fundamental 
problems in this area related to information retrieval and 
language translation. A large program is underway today 
in the area of automatic translation of one language to an- 
other. This work centers about the development of general 
purpose lexical processing equipment—the photoscopic 
memory and the word analyzer. Studies in language in- 
clude dictionary compilation, particularly by automatic ° tech- 
niques, the development of models for sentence structure to 
permit automatic parsing, and the analysis of the interrela- 
tionship of word meanings, The overall system is being 
tested as it develops by a continuing program of translating 
experiments. In addition, understanding is being acquired 
to exploit the present high- -speed scanning capability of the 
photoscopic memory in automatic indexing and abstracting. 

To unify the physical and linguistic results, work is also 
being done in information theory, operations ‘analysis, and 
machine organization. Work in these areas is pointed to- 
ward establishing a solid scientific basis for future develop- 


54122—60——-15 


221 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


ments. Fundamental studies are being initiated in the areas 
of information retrieval, military intelligence, and machine 
organization in addition to mechanized language translation. 


Federal Systems Division 


The Army has recognized the impact of advanced weapons 
on Field Army organizations. New organizational concepts 
have emphasized the principles of mobility, dispersion, and 
fast reaction. The resultant need for improved communi- 
cation and information gathering techniques was recognized. 
New devices are under development to meet this requirement. 

With improved techniques the potential quantity of data 
available is very bane. herefore, the need for automatic 
data processing has been established as a necessity in many 
applications. The Fieldata family of ruggedized computers 
is being developed to provide flexible basic equipment to 
meet growing Army needs. 

One of the major data problems in the Army will be the 
storage and retrieval of large volumes of data. In 1958 the 
Army awarded a contract to IBM to develop an IN- 
FORMER (Information Retrieval and Storage System) for 
Fieldata. To maximize flexibility the basic unit of the IN- 
FORMER is a general purpose digital computer. In addi- 
tion, a large random-access storage unit and specialized data 
search unit are provided. The INFORMER, in its shelter, 
will be mounted on a 214-ton truck and will operate under 
field conditions. 

Initial application of the INFORMER is planned to be 
battlefield intelligence. The initial INFORMER specifi- 
cations did not outline ultimate requirements. These ulti- 
mate requirements cannot be defined today. As Army 
personnal apply the INFORMER they will learn more 
about their problems and evaluate the equipment. As their 
experience grows they will be able to specify their final re- 
quirements and fully utilize advanced information retrieval 
techniques. 


In addition to the above data, IBM also submitted seven progress 
reports on Project WALNUT, developed by its Advanced Systems 
Development Division at San Jose, Calif., which have been made 
a part of the committee’s files. Representatives of the CIA also 
briefed the staff in detail regarding the WALNUT project, its objec- 
tive and purpose, and outlined its present status of operation. 

The committee was supplied with a copy of proceedings of the 
IBM Information Retrieval Conference, held on September 21-23, 
1959, at Poughkeepsie, N.Y., certain extracts and conclusions from 
which were included in the premise of this report. 


Trex Corp. 


Conferences were held between representatives of Itek and mem- 
bers of the staff during the month of March 1960 and, in response to 
the committee’s request, Mr. J. H. Carter, vice president and general 
manager of Itek’s Information Technology Center, Boston, Mass., 
requested Dr. Duncan E. Macdonald, formerly dean of the Boston 
University Graduate School and director of its Physical Research 





DOCUMENTATION OF SCIENTIFIC INFORMATION 223 


Laboratory, and now director of Itek’s Program Division, to submit 
the following report: 


Itek welcomes the opportunity to describe to you its 
achievements, activities and expectations on the future de- 
velopment of documentary systems for scientific information. 
Your survey report on the state of nationwide activity in 
this field will be of significant value to commerce and indus- 
try, as well as to those Government branches and constitu- 
ent agencies that are interested in current and potential solu- 
tions to their pressing problems in handling scientific infor- 
mation. 

The job of revising prevailing methods of communication 
and dissemination of scientific literature is a mammoth under- 
taking. Few other endeavors are comparably caught in such 
a squeeze between the weight of the past and the press of the 
future. Problems remain in large segments of the field with- 
out the availability of proven systems, the equipment, or 
techniques. In many applications, situations have existed 
where state of the art has been insufficiently advanced to 
provide a truly comprehensive solution. It is particularly 
important that a thorough analysis of an organization’s 
information handling system needs be made prior to selecting 
equipment. 

Itek was organized in September 1957 to enter the field 
of information technology and to produce the information 
systems and equipment, both military and commercial, which 
were recognized as vital to our national progress. The 
name of the company itself, a contraction for information 
technology, symbolizes our sphere of business interests. We 
have assembled experienced groups of specialists in a num- 
ber of diverse fields contributing to a frontal attack on the 
problem. We have been continuously engaged in research 
and development in graphic information as a major activity. 
In the reconnaissance fields, we have delivered advanced sys- 
tems and components, may of which have had substantial 
testing and are now in actual operation. 

We have been busily engaged in extending our research 
and development to produce the tools and media necessary 
to establish documentary systems in commercial applications 
as well. It will be most efficient for purposes of describing 
our program to first outline a few concepts common to most 
systems handling information in graphic form. 

Figure 1 shows in a block diagram several important func- 
tions that make up a documentary system for scientific in- 
formation. This diagram also applies to system designs for 
handling other types of data; for example, medical and legal 
documents. Functions of major interests are those associ- 
ated with the graphic store and searching tools and stores. 
If a documentary file is to employ modern mechanized con- 
cepts for the storage of its contents on photographic film, 
as contrasted perhaps to the storage of conventional books 
on library shelves, equipment and processes will be required 
to deliver output information in the form desired. The out- 
put forms of information desired cover a range such as to 
possibly include viewing, duplicating, printing, displaying, 





Zz 
© 
—_ 
: 
5 
es 
Z 
= 
© 
— 
a 
_ 
2 
= 
— 
o 
D 
67 
© 


DOCUMENTATION 


S3Oy/ 45 ul OY/OM 
woumodsey yaf/ 
‘SIs Os/ 


1° AU GAO, 
WUT OA 
Sysi/ LYOUB IGNOU ff bo 


ou) 
buy 
ou18// 
(790 )5 PA ? 
SLOAN! NOILSINIO 
QL SFISNOASTY 


| LNALIVO 


SUOLJOAIIO Hid {SAS /OLBNOS sof $/00L 


graf hy nf e oyom fo butsa 
prow up Ele WH ot Stile ee 
smoysod AO jo fe viye anisgaisep fuemazep fo //2 
2 Suorplod jndjne og ygissod 29 Te J0y ped!Sap 57 
YOU! (bt $0 pepOAl{ PUD Pesser047 
Yad Ui ag 402 puc anjen fo uapfe s/ mashes ay fo 
gacjoo ay fe saasn 1g pysiaual vjep poy pe2f 425 1) 


(J) 
Aah 


SBAOLS 
pupioty fo 


fuissaradd 
hanjag 


* 


WILSAS NOILYWAIOIN/ AAVLNINNIOT 
VY XO SNO/ILINAY AOLVWIO IN/TL/AO 


Fa)sNOILSINO 


weg 
pales 


Gre oieg 
aydeiboyqng 


(09) S947 L9ATHIO 
LNIWNIIOD 


(oa )sr:iHdv49 INFIWNIOT 
y SLNIMNIOT 


|_zaaws | 


ve 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


and transmission of information. After the step of record- 
ing an incoming document, an accession number can be added 
to enable storage and retrieval via the assigned number. 
The other ancillary functions shown in the block diagram 
relate to the problems encountered in providing a capability 
of searching for the accession numbers, or for document 
derivatives that identify the documents of interest in answer- 
ing a question posed to the system. 

In processing a request for information, a search is often 
made of stored indexed data which contains some abbre- 
viated form of description of the original document. This 
necessitates that the original document be processed in a 
formal manner to produce the index data in a form which 
corresponds to that originally used in generating the index 
data for incoming documents. 

Other aspects of the total scientific information problem 
include activities of publication, communication, and dis- 
semination of documents before filing. The outer boundaries 
of the information system we are concerned with perhaps can 
be said to begin in the author’s mind and to terminate in 
the consumption of the information by the user. 


ACTIVITY 
A. Military systems 

Several major projects have been completed, or are under- 
way for military agencies handling intelligence and recon- 
naissance information. Descriptive detail on the equipment 
and its performance levels may not be disclosed for obvious 
security reasons. The equipment in these systems include 
individual and group projection viewers for examination 
and analysis of graphic information retrieved from storage, 
high speed continuous printers for producing high acuity 
copies of the original information photographed, film proc- 
essors, cameras, storage units, so associated analog and 
digital electronic circuits and computers for the operation 
of integrated systems handling massive quantities of graphic 
information. 

Techniques employed in these systems include not only the 
more conventional roll film handling methods, but also new 
techniques of handling individual pieces of film of uniform 
size on which are stored blocks of information, thereby pro- 
viding unit records. This approach opens up entirely new 
degress of flexibility in designing information handling sys- 
tems requiring repetitive high speed storage and retrieval, 
simultaneous use of information from central storage by 
several users, and where additions to the file of new or more 
complete information are necessary to replace older material 
which is then purged from the system. 


B. Documentary systems 


Itek Corp. has developed the necessary techniques to build 
efficient, high speed, and long life equipment for handling 
graphic and documentary images. This includes the design 
of a variety of photographic format sizes for handling sub- 
ject material ranging from a printed page to high acuity 


225 





226 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


aerial photography. A brief listing of the specifications 
which are met by the techniques which we have developed 
is offered in table I. 

Working engineering models have been built to demon- 
strate the performance of equipment designs employing im- 
proved methods of scanning, sorting, filing, retrieving, and 
so on, Several industry and Government applications are 
under study that can profit by the installation of graphic 
files of this design. 


C. Research in input processing 


We are prepared to undertake the design and construction 
of index search selectors in several forms and are currently 
engaged in associated studies. Our main research interest 
and activity in searching methods at this point in time, how- 
ever, is in the development of more powerful procedures 
for processing the information contained im the original 
documents in order to produce index data that will permit an 
efficient and discriminating search for the desired informa- 
tion. The significance of this emphasis is appreciated when 
one considers the staggering volume of existing scientific data 
in documented form and the rate at which this volume is in- 
creasing. A basic research study with National Science 
Foundation sponsorship is being conducted at Itek of meth- 
ods for the production of index data for input. to a system. 
The approach under study provides procedures for the nor- 
malizing of input expressions in natural language and proce- 
dures for representing them. Considerable promise 1s evi- 
dent. that the normalized representation when more fully 
developed can be applied to provide index data of greater 
discriminating power that will permit the selection process 
to cover vast quantities of material and identify the relatively 
few documents of interest. No universal system concept can 
be said to exist at Itek or elsewhere on this part of the in- 
formation problem, however, for here is embraced a full 
range of basic problems in communication and language use. 

It should be recognized that the sophistication required 
in the indexing methods of various information systems will 
vary with the nature of the material and its pattern of use. 
Our research is focused on those applications where high 
volume, high growth rate, and the requirement for high 
quality retrieval are present. Unless a system can be so de- 
signed that its outputs satisfy the needs of the system users, 
it will be difficult to justify the operating and equipment 
costs of a large system. To achieve outputs which will 
satisfy system users, a maximum of searching power or dis- 
criminating power is desired while, at the same time, obtain- 
ing economic feasibility in a system design. 


D. Experimental systems 


While basic facets of the indexing problem are being exam- 
ined under a research program, we are also studying less basic 
but still important problems encountered in operating inte- 
grated mechanized systems for handling information. Typi- 
cal are the problems of normal language input, conversions 
of machine code from one to another and methods of handling 









DOCUMENTATION OF SCIENTIFIC INFORMATION 227 















































new characters and symbols. Solutions to problems of this 
kind are synthesized and tested on an experimental mecha- 
nized system. This equipment is available for application to 
research on special problems in information handling. 
When not so engaged the system is used to process Itek 
library material serving our own technical staff while at the 
same time providing a working system, accumulating data 
on library usage. 

FE. Future activity 


Continued emphasis at Itek will be placed on basic research 
in the development of index data-processing methods. 
Graphic handling equipment will be applied to thine infor- 
mation systems which are adequately handled by existing 
concepts. 

We anticipate continued evolutionary development in 
methods of input processing for index data. This should be 
taken into account in the design of scientific information 
systems which may, because of urgency, be pressed into serv- 
ice with today’s limited techniques. 

Figure 2 (p. 230) illustrates the point. Here, we consider 
the idea of a search selection service for use by several agen- 
cies as supplementary to their own existing or future informa- 
tion systems. As clients of the central service, each may own 
their own graphic files and/or regular library for fast ac- 
cess to documents, or they may not possess any substantial 
collection of documents. Besides offering a more favorable 
load factor in searches of a large body of literature and other 
operating economies, this concept offers a minimum in con- 
fusion and expense in updating the searching operation as 
continuing progress is made. Great strides can be expected 
in both the method of producing index data and in the design 
of search selector equipment. 

Although the centralized operation of a search selector ap- 
pears feasible and advantageous in supplementing individual af 
agency efforts, it does not follow that a complete job of input 
processing of new documents for index data can be done on a 
centralized basis. Input processing for index data ideally 
covers every document of interest to any system user and 
therefore might best be done by individual client agencies in 
all cases other than for commonly available publications. 
The processed index data would then be available for system- 
wide use after transmittal to the central search activity. 

Two additional points should be considered in early efforts 
to apply mechanized methods of handling graphic informa- 
tion in Government operations. 

The first of these is the importance of realizing that han- 
dling graphic data in the quantities and speeds of interest is 
anew art. This is not a warmed-over problem in “data proc- 
essing” where existing equipment, principally the digital 
computer, has done such a creditable job. New equipment 
based on entirely different concepts is necessary to efficiently 
handle graphic or documentary information. 

Lastly, it is very important in our view that efforts to apply 
these mechanized methods in government operations foster 


228 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


rather than detract from a strong research and development 
program. To this end it would be advisable to establish an 
experimental data or pilot system for new methods in order 
to assure the existence of an environment in which to try out 
new techniques, methods and equipment. Once a large in- 
formation system becomes operational, it is unlikely that it 
will be available as a proving ground so essential for con- 
tinued growth of a technology now in its infancy. 


F. Summary 


Itek Corp. has enjoyed considerable success in the devel- 
opment of effective graphic file equipment design and is ac- 
tively engaged in studying application opportunities. A 
major research program of investigation continues on index- 
ing methods and their relation to the design of mechanized 
systems. We feel that it is important to continue to press 
for more knowledge of the significant features of natural lan- 
guage which are the common media for conveying informa- 
tion in many existing communication channels. Methods 
for determining language features, and arriving at feasible 
procedures for dealing with them, will require effort of high 
competence. Advances in this area will be evolutionary for 
some time and should be allowed for where possible in the 
design of systems pressed into early service because of urgent 
needs. 


DEVELOPMENT IN INFORMATION HANDLING SYSTEMS 


1. Present nonmilitary equipment developments 


A. Storage medium: Photographic film is used with choice 
available between roll film or individual film chips for use 
as a unit record. A range of format width sizes from 16 
millimeter to 70 millimeter can be selected as the physical 
handling methods have been shown to work over this range. 

B. Resolution of film image: Highest quality optics avail- 
able permitting wide choice of high acuity film materials. 

C. Storage volume: 500 to 2,000 chips per magazine ac- 
cording to system requirements. 

D. Flexibility: Modular construction permitting multi- 
purpose use of component units. Maximum consideration 
given to compatibility with existing data handling equip- 
ment and to provision for growth in customer’s need. 

KE. Index search selector: Separate from graphic file. Pre- 
sents an accession number to the graphic file. Need varies 
widely with application; sometimes function performed by 
digital computer. 

F. Random sequence: Each chip carries its own accession 
number thereby permitting high speed return to storage at 
random where desired in system design. 

G. Scanning means: Electronic, permitting handling and 
searching speeds of 50 to 500 chips per second according to 
requirements. 

H. Life: All film handling equipment units designed for 
an absolute minimum of contact to the image area of the film. 
Transport means do not stress the film material. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 229 


I. Code structure: Variable, but inclusive of common 6-bit 
word length for English characters. 

J. Read out: Simultaneous user use. Chips are not neces- 
sarily removed from equipment in central system for return 
upon use or duplication. 


PUBLICATIONS 


As an indication of the diverse scientific interests of Itek 
personnel, which bear directly on the information handling 
field, the following selected list of professional publications 
is offered : 


Barakat, Richard G., “On the Transient Stage of the Random Dis- 
persal of Logistic Populations,’ Bulletin of Mathematical Biophysics, 
vol. 21 (June 1959), pp. 141-151. 

Barakat, R. G. and Barakat, R. A., “Optical Studies of the Diffraction 
of Water Waves by Circular and Thin Elliptic Cylinders,” Journal 
of Applied Physics, vol. 31 (March 1960), pp. 474-478. 

Barakat, Richard G., “Transient Diffraction of Scalar Waves by a 
Fixed Sphere,” Accoustical Society of America, Journal, vol. 32 
(January 1960), pp. 61-66. 

Berman, E., et al, “Photochromic Spiropyrans. I. Effect of Substi- 
tuents on the Rate of Ring Closure,” Journal of American Chemical 
Society of America, vol. 49, pp. 724-728 (1959). 

Foreman, William T., et al., “Ultraviolet Transparent Alkali Metal 
Filters,” Journal of the Optical Society of America, vol. 49, pp. 
724-728 (1959). 

Foreman, William T., “Streaming Birefringence and Optical Relaxa- 
tion Time of Vanadium Pentoxide Sols,” Journal of Chemical 
Physics (January 1960). 

Foreman, William T., et al., “The Role of Surface Tension in Rayleigh- 
Bernard Instability,” Journal of Fluid Mechanics (Spring 1960). 
{In press. ] 

Howell, Hutson K., “Photographic Emulsions for Missile Photography,” 
Photographic Science and Engineering, vol. 2 (August 1958), pp 
95-104. 

Kuipers, John W., A Research Program on Information Searching 
Systems. Presented before the Combined Sessions of International 
Federation for Documentation, Deutsche Gesellschaft fiir Docu- 
mentation, Gemeinschaftsausschuss der Technik, and the Gmelin- 
Institut on “Automatic Documentation in Action,” Frankfurt/Main, 
June 9 to June 13, 1959. Itek Publication P-116. 

Kuipers, J. W. and Williams, T. M., “A Program of Research and 
Development on Information Searching Systems,” Waltham, Mass. : 
Itek Corp., 1959. 

Lipetz, Ben-Ami, “A Successful Application of Punched Cards in 
Subject Indexing,” American Documentation, vol. 11 (July 1960), 
[In press. ] 

Nungent, William R., A Machine Language for Document Trans- 
literation. Paper presented before the Fourteenth National Con- 
ference of the Association for Computing Machiners at Massachu- 
setts Institute of Technology, Cambridge, Mass., September 1, 1959. 

O’Neill, Edward L., “Graininess and Entropy,” Optical Society of 
America, Journal, vol. 48 (December 1958), pp. 945-947. 

O’Neill, E. L., Noise, Entropy and Filtering in Photographic Optics. 
Paper presented before the Fifth Meeting and Conference of the 
International Commission for Optics (ICO), Stockholm, August 24 
to August 30, 1959. Conference on Modern Systems for Detecting 
and Evaluating Optical Radiations. 

O’Neill, E. L. and Asakura, T., “Hermitian Matrices Study.” Waltham, 
Mass. : Itek Corp., 1959. 

O’Neill, E. L. and Asakura, T., “Image Formation in Terms of 
Entropy,’ Waltham, Mass. : Itek Corp., 1959. 

O’Neill, E. L. and Kornstein, E., “Grain Study,’ Waltham, Mass.: 
Itek Corp., 1959. 

Shannon, R. R., “Comparison of Image Evaluation Methods,” 
Journal of the Optical Society of America, vol. 49, p. 506 (1959). 





230 DOCUMENTATION OF SCIENTIFIC INFORMATION 


Shaw, C. H. and Foreman, W. T., “Ultraviolet Transparent Alkali 
Metal Filters,” Optical Society of America, Journal, vol. 49, pp. 
724-728 (July 1959). 

Swing, R. BH. and Barry, 8. A., “A Theory of Monobath Kinetics,” 
Waltham, Mass. : Itek Corp., 1959. 

Williams, T. M., “From Text to Topic in Mechanized Searching Sys- 
tems,” paper presented at National Symposium on Machine Trans- 
lation, Los Angeles, February 2 to 5, 1960. Itek report RPIS 60-8. 


Itek 


FIGURE 2 


Agency K 


Etc. 
Some with graphic 
files, some with 
conventional Ii- 
brories, some with 
neither but acces- 
sibility to 
beth. 









DOCUMENTATION OF SCIENTIFIC INFORMATION 231 


JONKER Bustness Macuings, Inc. 












































Following consultations with the staff of the committee, and in 
response to the staff’s request, Mr. Frederick Jonker, president of 
the Jonker Business Machines, Inc., submitted a summary report, in 
which he pointed out that 


The problem we are concerned with is the problem of the 
flow of scientific information. The flow of the information 
itself at the present state of the art no longer presents any 
technical problems. It can be handled by means of presently 
available microfilm and copying techniques. The real prob- 
lem is the flow of the index information which tells where 
information of a certain nature, or the answers to certain 
problems, can be found. 


Also, at the suggestion of the staff, Mr. Jonker furnished the com- 
mittee with two preliminary proposals, outlining a plan for a nation- 
wide network for the flow of scientific and technological information, 
and concerning equipment for use in the proposed nationwide in- 
formation flow network. These proposals are included in this report, 
as follows: 





{Preliminary proposal I] 


OUTLINE PLAN FOR NATIONWIDE NETWORK FOR THE FLOW OF 
SCIENTIFIC AND TECHNOLOGICAL INFORMATION AND PROPOSAL 
FOR FURTHER DEVELOPMENT OF THIS PLAN * 









Summary of the proposal 
This proposal rejects the notion of one large “billion dollar” 
national information center, in which a large “multimillion 
dollar” computer answers questions. (There is a place for 
the computer of no lesser importance; namely, in the prepara- 
tion and sorting of data by computer service bureaus, for 
participants in information flow program.) 

This proposal likewise rejects the notion that the informa- 
tion flow problem requires an artificial universal index 
language. It is felt that such solutions are not. only theoreti- 
cally incorrect, they would also be unenforcible as well as 
disastrously expensive. Even if feasible their realization 
would take many years and billions of dollars. 

According to the present proposal, a beginning of the 
solution to this problem can be made now at a very modest 
cost. Nearly all of the required techniques and equipments 
are already available or can soon be available. 

The present proposal is based on the use of the basic vo- 
cabulary of the English language. These basic words will 
form the common language or universal index language. Ac- 
cording to our proposal it will be possible to enter any present 
information collection, whether based on a classification sys- 
tem, an alphabetic subject heading list or any form of key- 
word indexing into the nationwide information flow system. 
This could best be achieved by the use of search systems 


*Submitted to the U.S. Congress, the National Science Foundation, and the National 
Academy of Sciences. 


232 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


based on the so-called inverted grouping, or term-organized 
search systems. Probably 90 percent of all search systems 
now on the market and specifically intended for the informa- 
tion retrieval problem are of this nature. They include small 
manual systems varying in cost between $10 and $100. They 
also include conventional punched card systems as well as 
the large electronic general pu computer. With the use 
of the proposed indexing techniques all of these equipments 
become compatible and data can be transferred from one to 
another at extremely low cost. 

According to the proposal, individual scientists and smaller 
organizations could index their data and enter them into a 
system involving $10 to $100 in equipment. Copies of these 
records then flow to data collection centers such as head- 

uarters of societies, or large companies. These data collec- 
tion center then exchange information. They will answer 
specific oe ions from individuals and disseminate frag- 
ments of their collections to individuals upon a stated 
interest or “need-to-know” basis. 

The main problem that remains will be the actual opening 
up of the channels of information: First, within each science 
or discipline; next, between different sciences and disciplines 
in order to promote cross-fertilization between the arts and 
sciences. This actual opening up of the flow channels will 
require far more money than research on indexing methods 
and search equipment. 

The present proposal specifies the research work and test 


work required to solve the detailed problems of the proposed 
indexing methods and utilization of equipment as well as 
the research work needed to determine the most efficient way 
of opening up actual information channels. After this re- 
search work it is proposed to put the results to work on a 
full-scale demonstration project involving a selected field of 
science. 


Nature of the problem 


Suitable low-cost microfilm techniques for the handling 
of the flow of the information itself are already available. 
Practically all problems center around the flow of the “in- 
dex information,” which tells where information of a cer- 
tain nature, or the answers to certain questions can be found. 
The “information problem” is the problem of “making any 
information generated anywhere, available to anybody 
anywhere else.” 

The proposed information flow network visualizes two dif- 
ferent types of information flow: 

I. Convergent flow from the individual scientist to data 
collection centers, such as the headquarters of large industrial 
organizations or headquarters of the societies of the various 
arts, sciences, and industries. This convergent flow remains 
entirely within one field of science or industry. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 
II. Divergent flow from the various data collection centers 


(a) Individual scientists or smaller organizations 
within the field of science or industry served by that 
center ; 

(6) Other data collection centers serving other 
sciences and industries as well as to smaller organiza- 
tions or individual scientists in these other fields. This 
flow is particularly important as nearly all human prog- 
ress stems from cross-fertilization between the arts and 
sciences. 

These information flow problems comprise two different 
aspects, namely : 

1. The indexing problem; 

2. The problem of the search equipment. 

The national information problem is an economical rather 
than a technical problem. Many successful techniques of 
indexing and many types of search equipment are already 
available. The problem is to find techniques and equipment 
low enough in cost and simple enough to be peentanes for 
mass utilization in nationwide network covering all fields of 
science and technology, and including individual scientists 
and small laboratories as well as large organizations. 

For example, satellite reconnaissance and inspection data 
and other monitoring data may run into billions of intems of 
information. With some of the present techniques and 
equipments the cost of indexing and storing these data alone 
would run into billions of dollars. It seems therefore clear 
that the Nation can afford only the simplest and least expen- 
sive techniques. The present cost factor of several dollars a 
document will have to be reduced to nickels and dimes or even 
to pennies. This applies to indexing techniques as well as 
machine retrieval systems. 

The indexing problem 


Indexing methods have been proposed for the national in- 
formation flow problem which are based on the use of a (pro- 
= elaborate and complicated “universal” artificial index 
anguage. We maintain that— 

(a) Such an artificial language can only be developed 
for a limited field of science. It can be shown that it 
can in theory not be a universal language usable for all 
arts and sciences. 

(6) Secondly, we believe that the use of such a com- 
sono index language would be absolutely unenfor- 
cible. 

(c) Lastly, these indexing techniques are so laborious 
that their cost is prohibitively high. 

However, in the last 5 years very simple and very inex- 
pensive indexing techniques have been developed which can 
best be described as keyword indexing or key-concept index- 
ing. At first these techniques were pioneered by a few small 
companies under names like “Uniterm” indexing and “De- 
scriptor” indexing. They now have—in a number of differ- 


233 





234 DOCUMENTATION OF SCIENTIFIC INFORMATION 


ent variations—found practically universal acceptance for 
machine searching. 

Moreover, it is the only method of indexing that is open to 
complete automation, that is the performance of the “intellec- 
tual” task of indexing on the general purpose computer (on 
a “word-count” basis). Great progress has recently been 
been made in these techniques. 

These methods of indexing are all based on the use of the 
natural language. 

Our proposed solution to the national information flow 
problem is likewise based on the use of the natural language. 
It is based on the use of the basic vocabulary of the English 
language. These basic terms form a “universal index lan- 
guage” already in existence and already in universal use 
throughout all of the arts and sciences. 

The solution proposed by us does not base itself on any par- 
ticular indexing system. We propose that all “classes” or 
“subject headings” or “Descriptors” be entered into a ma- 
chine system simply as kcoordinations” of the basic terms 
which make up these classes, subject headings, or descriptors. 

To give a simplified example, the class “underdeveloped 
countries, Africa” could be entered simply as three separate 
words “underdeveloped,” “country,” sid *Africa.” Since 
all classes and subject headings are made up out of the same 
basic terms, practically any preexisting information col- 
lections could in this manner be entered into the nationwide 
information flow network. 

The above example shows the simplest approach to the 
problem. However, in certain cases somewhat more complex 
relationships than “coordination” (so-called interfixes) 
between terms, may be required. These can be handled by a 
combination of two different methods: 

1. Splitting an item of information into two or three 
other items of information. 

2. Enlarging the “vocabulary” by recognizing different 
“roles” and meanings of the same English word. There is a 
great possible variety in these “roles” and meanings. It is, 
for example, possible to distinguish between a term as a verb, 
or a noun or an adjective, or the roles can be quite arbitrary. 
Certain terms can be “tied to” other terms by means of suf- 
fixes, etc. 

In this manner all indexing methods can be reduced to com- 
mon denominators and all become compatible. It is then 
possible to enter classification systems and subject-heading 
systems into the nationwide information flow network, either 
with or without the recognition of hierarchy or interfixes. 

For a search the machine system can be interrogated by 
any of these systems. Data originally indexed by a classi- 
fication system can be searched by subject headings or key- 
words. Data originally indexed by keywords can be searched 
by a classification system, ete. 

In this manner the problem within each limited art o1 
science can be solved for the purposes of this particular art. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


The need for a more precise language for index purposes 
is generally recognized. To arrive at this more precise lan- 
guage, the societies and associations of the various arts and 
sciences should be encouraged to accelerate their programs 
of standardization of terminology and _ generation of 
nomenclature. This will solve the problem of index 
terminology for those collections that deal only with a par- 
ticular art or science. 

The problem of exchange of information between the data 
collection centers representing the various arts, sciences, and 
fields of technology is more difficult (for example, biologists 
may want to search electronic data collections, ete.). This 
comes down to making a highly professional information 
collection searchable by a layman in that particular field and 
may require a Thesaurus to translate the layman’s question 
into professional terminology (for example, the biologist 
would be a layman in electronics). However, this would 
merely complicate the phrasing of the search. It would not 
complicate the flow of the search information. 

Thus, the use of the basic vocabulary of the English lan- 
guage seems quite feasible. However, one more problem 
remains: As science and technology progress, new scien- 
tific and technological concepts are constantly being formed. 
As a result old terminology is changing its meaning or dis- 
appears and is replaced by newer terminology with a dif- 
ferent level of generality, and which overlaps the meaning 
of other terms in a different manner. No artificial index 
language can cure this. It is inherent in the processes of 
technological and scientific progress. 

However, this changing and expanding terminology will 
require a constant reediting of the vocabulary and updating 
of the system. This can be economically performed only 
when using machine systems based on so-called inverted 
grouping (inverted grouping will be explained in sec. 3). 

Machine systems based on inverted grouping use records 
dedicated to terms on which serial numbers of information 
items have been recorded. As a result the editing can be per- 
formed simply by changing the name of the term records or 
by combming or splitting term records. These are all 
extremely fast and inexpensive operations. 

(Examples: Terms can be changed simply by changing 
the “label” attached to the term records. Synonyms and 
semisynonyms can be eliminated simply by combining the 
postings of both term records on a new record. Terms like 
“navigation” can be split into “celestial navigation” and 
“magnetic navigation” by taking two new term records with 
the coordinations of the terms “celestial” and “navigation” 
and the terms “magnetic” and “navigation.” To do this sort 
of editing on conventional systems based on “item records” 
instead of “term records” is—though possible in principle— 
prohibitively complicated and expensive.) 


235 





236 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


T he problem of the search equipment 


The thinking of the Nation on this problem has to a very 
large extent been influenced by science fiction thinking. In 
this, machines are visualized to which we can address a ques- 
tion and which present us with a printed copy of the answer. 
The next step taken is to have one such gigantic “computer” 
placed in a “national information center,” to answer all prob- 
lems telephoned into this center. 

While still a favorite with the lay public, this concept is 
now squarely rejected by»most professionals. The reasons 
are: 

I. It is generally better to separate the search mechanism 
from the information store because— 

(a) The information flow problem is actually not the 
problem of flow of the information itself, but the flow of 
the index information which allows one to find out where 
information pertaining to certain subjects is available. 
The flow of the information itself is therefore actually a 
secondary problem. 

(6) As long as the information itself and the index 
information are kept separate, the cost of the system can 
be a small fraction of the cost of a system in which both 
are combined. 

(c) Moreover, there are already many different types 
of (microform) information storage systems in actual 
use. By keeping the search mechanism and information 
storage mechanism separate, such information collec- 
tions can be tied into the nationwide network. 

II. The present feeling is that centralization should not go 
beyond a data collection center for each of the arts and sci- 
ences. These can then exchange copies of their (punched- 
card or magnet tape) “search records.” 

Tens of millions have in the last 10 years been spent on 
various large search equipments costing on the order of one- 
half to several million dollars. Most of these were based on 
the concept of the large central search system. However, 
these systems have so far found few applications; their cap- 
ital investment is very high and they are very expensive and 
complicated in operation. 

Meanwhile, despite this prevailing belief in the large elec- 
tronic “pushbutton” central-data systems, under the sheer 
pressure of necessity a totally different approach sprang up. 
This is the inverted approach. The Uniterm system of 
coordinate indexing and the superimposable card system 
(also known as Termatrex, Peek-a-boo, or Batten principle) 
are two of the simpler systems based on this approach. How- 
ever, punched-card collators and various magnetic tape sys- 
tems likewise belong in this category. Electronic computers 
can also be made to operate according to these principles. 

Conventional search systems have a record of every in- 
formation item and the terms describing this item are re- 
corded in code upon that record. Inverted search systems 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


have a record for every term used in the system and the items 
are recorded in code on the term records, 

Inverted systems have the following advantages : 

1, Their search speeds can be hundreds or thousands of 
times greater. 

, 2. The equipment cost per search can be hundreds of times 
ower. 

3. It is possible to merge data collections and adapt the 
collections to changes in terminology, by very simple, fast 
and inexpensive procedures. 

The merits of inverted grouping for information retrieval 
systems have found fairly omsesknnaghtes and probably 
90 percent of all data processing systems especially designed 
for information retrieval purposes are based on inverted 
grouping. All of these systems are compatible, that is, data 
can be transferred from one system to another in a relatively 
inexpensive and simple manner. Inverted machine systems 
therefore form the basis of this proposal. 

Inverted search systems can be set up for as low as $10 
in equipment teotalied terminal-digit card systems or Uni- 
term-card systems). These manual systems could, for exam- 
ple, be used by individual scientists. 

For larger collections and more severe retrieval require- 
ments, simple machine systems are available in the price 
range of $25 to $250 (superimposable card systems). 

Copies of the records from either of these systems could be 
sent to a more central location and put into larger machine 
systems (such as punched-card collators or file computers). 

(We ourse ae ves have equipment in the planning stage 
which uses the same records, from the smallest to the 
largest collections. As these records are collected at 
data centers, they are simply spliced together. Data can 
be entered into these records with equipment costing 
about $200. At central collection points these records 
are copied and simply spliced together. Scanners to 
scan the records at central points will cost a few thou- 
sand dollars and can search millions in a matter of min- 
utes. Smaller scanners for individual users and small 
organizations will run from $100 to $500. 

We believe that these machine systems can perform the 
same function as the collators and file computers at only 
a very small fraction of their cost and complications. 
We have therefore supplmented this proposal by another 
proposal for the development of this equipment. How- 
ever, the present proposal does not hinge on or is not 
mentlendal on the availability of the pro equip- 
ment, The proene nationwide system is feasible with 
presently available equipments. ) 


Proposed study, test, and demonstration work 


I. A basic study regarding the nature of “index-language” 
or “index-terminology.” ‘This study includes all indexing 
systems based on natural language as well as proposed sys- 





54122—60- 16 


237 





VUCUMENTATION OF SCIENTIFIC INFORMATION 


tems based on artificial index language. This study work 
will reduce all indexing systems to the same common denomi- 
nators. All forms of “interfixes,” hierarchy, etc., will become 
part of the “vocabulary.” The same applies to search func- 
ctions involving concepts like “greater than,” “smaller than,” 
“before” or “after” etc. 

The reduction of all indexing systems (including artificial 
index languages) to coordination of natural language terms 
(of a more complex nature) is in theory quite feasible, and is 
already in experimental operation at a number of places. 

II. A basic study regarding the convergent and divergent 
information flow, concepts, outlined previously. This study 
will deal with the two aspects of the problem : 

(a) The indexing; 
(6) Thesearch mechanisms. 

The study will probably reveal a number of alternative 
indexing methods and search mechanisms able to meet the 
technical requirements of the convergent-divergent informa- 
tion flow concept. 

The study will particularly emphasize exchange of infor- 
mation between centers serving unrelated arts such as biology 
and electronics, etc. 

III. An economic study, to find indexing methods and 
search mechanisms for new information collections which 
provide acceptable retrieval detail at the lowest possible cost. 
This will include such problems as whether the greater pin- 
pointing capacity of indexing and retrieval with various 
types of interfixes is warranted by the higher cost. It will 
include a comparative cost study of total cost in operation of 
the various present and proposed search equipments. 

IV. A survey to determine what types of information 
most vitally affects our scientific and technological progress. 
The purpose of scientific progress is technological progress, 
so the two cannot be separated. This study will attempt to 
cover the most important fields of science, applied science, 
as well as the most basic industries. 

V. A survey and study regarding the “political” and or- 
ganizational and financial aspect of the opening up of infor- 
mation flow channels, in the arts, sciences and industries 
which are most vital to progress. This concerns, of course, 
information flow within a certain art or discipline as well 
as information exchange between the various arts and disci- 
plines. Copyright and royalty problems will likewise be 
analyzed in close cooperation with standing committees work- 
ing on this problem. The survey will also include problems 
like Department of Defense security-classified information 
and commercial company-confidential information. 

VI. The creation of a test project, simulating the actual 
conditions of the nationwide problem on a small scale, with- 
in a certain selected organization or group of organizations. 
In the course of this test project all of the problems that are 
encountered will be ironed out and the necessary corrections 
and adjustments will be made. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


(Preliminary proposal II] 


OUTLINE PLAN FOR SIMPLE LOW-COST EQUIPMENTS FOR USE IN 
A NATIONWIDE NETWORK FOR THE FLOW OF SCIENTIFIC AND 
TECHNOLOGICAL INFORMATION AND PROPOSAL FOR THEIR 
DEVELOPMENT 


Summary of the proposal 


Proposal I outlined a plan for a nationwide information 
flow network, utilizing presently available equipment based 
on that is called “inverted grouping” or “term organized” 
systems. 

However, most of these equipments are multipurpose equip- 
ments too expensive and complicated to operate for mass 
application. We therefore propose the development of very 
much simpler low-cost equipment especially designed for 
mass applications. 

The proposal first describes a series of presently available 
(so-called Termatrex) information retrieval systems, which 
would form the basis of the proposed equipments. Next the 
proposal describes the planned extension of this equip- 
ment. The system together with the presently available 
Termatrex equipment will provide extremely simple search 
equipments for a nationwide information flow network at 
one-tenth the cost or less than the next cheapest (but still im- 
practical) alternative solutions. 

To provide a simple and inexpensive tie-in with the many 
already existing punched-card systems and computer infor- 
mation-search systems, some additional equipment might be 
desirable. 


Main characteristics of the Termatrex systems 

The Termatrex systems are a family of “modular” in- 
formation retrieval systems. 

They are designed for information “finding” or “locat- 
ing” purposes only. All they give are the serial numbers of 
those documents which provide the answers to a certain 
question. 

Because these systems are single-purpose equipment, in- 
tended only as an information locating device, they are the 
simplest, and cheapest information systems on the market. 
They are also as fast or faster in operation than even the 
most sophisticated devices 10 to 100 times higher in cost. 

The main advantages of the Termatrex systems over other 
machine systems are: 

1. Even for the largest collections, searches can be per- 
formed in a matter of 1 or 2 minutes. For collections of 
millions of items, hundreds of searches per day may be re- 
quired. Because of the principle of “Simultaneous scanning” 
of a complete collection or large sections of the collection, 
such search loads can be met without costly duplication of 
equipments. 

2. Termatrex systems are extremely inexpensive. Infor- 
mation collections of up to 50,000 items can be handled at 
a total outlay in equipment and cards of $1,000 to $2,000. 


239 





240 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Collections of up to 150,000 items can be handled at a total 
outlay in equipment and cards of $5,000 to $6,000. 

3. The Termatrex equipments are extremely simple, reli- 
able, and as simple to operate as a can opener or a vacuum 
cleaner. Anybody can oe how to operate them in a few 
minutes. 

4. Unlimited “vocabulary” of “terms.” Proper names, 
dates, numerical values, anything can become aterm. (Thus 
the “working vocabulary” may run into very large numbers. 
However, by combining Termatrex and Alpha-Matrex prin- 
ciples, the total number of cards can usually be limited to a 
few thousand.) 

5. There is no limit to the amount of terms by which an 
“item of information” can be indexed. There is no limit to 
the amount of terms that can be “coordinated” in a search. 

6. The “Geometric read-out” of the search results can be 
made uniquely compatible with the numerical storage of the 
information items. As a result the actual documents can be 

ulled from the files in 10 to 20 seconds per document, at a 
abor cost lower than the cost of a mechanized system. 

7. No specially trained operators are required, as a result 
of the absence of a keyboard punch. Yet, because Termatrex 
ro require no coding, the data-input cost is no higher 
than the input cost of conventional punched-card systems. 
The search labor cost and equipment cost are, of course, much 
lower. 

8. The Termatrex systems are “modular.” A smaller col- 


lection of items (up to 20,000 to 30,000 items) can be handled 

with a few hundred dollars worth of equipment. As the 

collection grows to millions of items, the capacity of the sys- 

tem can be extended —s by adding different pieces of 
] 


equipment. At no point will any equipment have to be dis- 
carded or will the data have to be reentered into a different 
type of machine. 

9. Termatrex systems are “Universal” storage and retrieval 
systems. They can in principle be made to perform any of 
the known retrieval processes that can be performed on elec- 
tronic machines. 

Complex logical search functions can be performed by 
means of our “photo-logical equipment.” By superimposing 
positive or negative or multiple-exposure prints of search 
results, combinations of logical products, differences or sums 
can be obtained. 

10. The Termatrex systems are compatible with about 90 
percent of all devices especially designed for information 
retrieval, such as the “Uniterm” terminal digit card system, 
the punched-card collators and magnetic tape collators. 
Brief explanation of basic principle of the Termatrex systems 

There are basically only two possible “punched card” sys- 
tems. 

The first is the conventional system, on which most present 


punched-card and edge-notched card systems are based, In 
these systems each card is dedicated to an “item,” such as a 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


person or a transaction, and the “terms” describing the item 
are punched out on the card. In a search, all cards have to 
be searched one by one by an expensive machine. 

The other principle, the inverted system, is exactly the op- 
posite. Here each card is dedicated to a “term” and serial 
numbers of the “items” are “punched out” on those cards. 

Consider a personnel file. Each employee is an “item” 
identified by a serial number. Each of these “items” is de- 
scribed by a number of “terms” such as “married” or “sin- 
gle”; “male” or “female”; age-class; wage-class; education ; 
rank; position; department; overseas experience, etc. With 
a large firm there may be several thousand employees. 

On the other hand, a total “vocabulary” of one or two hun- 
dred words is usually more than sufficient to describe all pos- 
sible properties of employees. 

In the conventional system, there is a card for every em- 
ployee and the terms describing each employee are punched 
out on his own card. 

If the inverted system is used, there is a card for every 
“term” in the “vocabulary.” Each employee gets his serial 
number punched out in each of the pertinent “term cards.” 

Thus employee No. 273 might have his number punched in 
the following cards: “single”; “male”; “born 1920-25”; $90 
to $100 per week”; “high school”; “foreman”; “welders” ; 
“tank department” ; “France”; “Far East.” 

Employee No. 2774 might have his serial number punched 
in the following cards: “Married”; “male”; “born 1915-20”; 
“$100-$110 per week”; “high school”; machinists’ school” ; 
“foreman”; “welder.” 

When, for example, repairmen are needed for tanks in 
Korea, a search can be made for all male employees who are 
single, who can repair Army tanks and atin in the Far 
East. This would be performed by superimposing the cards 
“single,” “male,” “welders,” “tank department,” and “Far 
East.” The answer to this search consists of the serial num- 
bere of the coinciding holes. Among these will, of course, be 

0. 273. 

A search, for example, for all male employees having a 
high school education can be performed by superimposing 
the cards “male” and “high school” and looking for coin- 
ciding holes. 

By the use of tiny holes or photographic reductions, col- 
lections of hundreds of thousands of items can be screened 
almost instantaneously with relatively very simple equip- 
ment. 


Present and planned Termatrea systems 


A. Present systems.—F igure 1 (at the back of this report) 
shows the simplest of our machines, the Termatrex—10, tem- 
plate model. For data entry, a hand drill is used. To enter 
an item of information in the machine, all the cards describ- 
ing a certain item of information are selected and placed in 
the machine. A hole is then drilled simultaneously through 
all of these cards at the position corresponding to the serial 


241 





242 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


number of that particular item of information. To locate 
these positions an x-y coordinate system is used. 

For searching, the cards describing the search question 
are placed in the machine and a light source in the base of 
the machine is turned on, The serial numbers of the coin- 
ciding holes can then be read off. 

The capacity per set of cards (“basic” capacity) is 10,000 
items of information. Price $190. 

Figure 2 shows a card file with 200 cards for the Terma- 
trex-10 machines. The average system requires $50 to $500 
worth of cards. 

Figure 3 shows the Termatrex—10, track model. The drill 
is guided on tracks. The machine has the same capacity as 
the Termatrex-10, template and uses the same type of card. 
Price $890. 

Figure 4 shows the Termatrex-40, having a basic capacity 
of 40,000 items. It utilizes cards of four times the size of 
the previous machines. If a collection outgrows the smaller 
machine, drilled holes can be very inexpensively “trans- 
ferred” from the smaller to the larger cards. Price $1,150. 

All of these machines are at present being manufactured 
and marketed. One of the largest chemical corporations 
alone now has 16 of our systems in operation. 


Application to the nationwide information flow problem 

The nature of the information flow problem has been 
outlined in proposal I. It encompasses a convergent flow 
from points of origin of information to information collec- 
tion centers, as well as a divergent flow (or dissemination ) 
from these centers to individual users or from one center 
to another. The Termatrex systems are especially designed 
for this convergent-divergent flow problem. 

The “punched” term cards can, at an investment of about 
$200 in equipment, be generated at points of origin of the 
information. 

These term cards can be miniaturized, copied at collec- 
tion centers and assembled into term records for large collec- 
tions encompassing an entire field of science. 

These search records can, at the collection centers, be 
searched by equipment costing less than $10,000. It will be 
able to search millions of information items in a matter of 
minutes. The records are so inexpensive that collection cen- 
ters can exchange copies of their search records, so that each 
collection center can partake in part or all of the total na- 
tional information flow, as desired. 

These collection centers can answer specific questions. They 
can also copy small segments of their search records and dis- 
tribute them to users on a “stated interest” basis. The users 
can scan these records with equipment of about $50 to $750 
in cost, depending on the size of the user’s own collection. 

Aiweah it is not the only equipment usable for the na- 


tional information flow problem, the cost of the proposed 
equipment will be a small fraction of the cost of alternative 
solutions. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 243 


However, there are many information services already in 
existence based on other types of equipment. The proposed 
equipment can tie in with about 90 percent of these alterna- 
tive systems and equipments which include: Uniterm systems, 
punched-card collators, magnetic-tape collators and general- 
pur computers. 

There are many functions such as the sorting and trans- 
formation of data and sometimes automatic indexing which 
can be advantageously performed in the general purpose 
computer. The possibility of tying in with these computers 
is of great importance. 

Thus, the role of the general-purpose computer is seen in 
the transformation and indexing of the data, rather than the 
search. Moreover, this work would be performed at com- 
puter service bureaus. Thus, although we do not believe in 
the concept of the computer as a large central search sys- 
tem, its role in the national information problem will be no 
less important. 


In the presentation outlined above, Mr. Jonker points out that 
Jonker Business Machines is the only company specializing in in- 
formation retrieval equipment only, and is the only organization in- 
terested and specializing in information flow rather than the concept 
of one single central installation, and urged support of his plans by 
the NSF, the Department of Defense, and the Congress. 


Macuine Transwation, Inc. 


As a result of consultations held with Miss Ariadne Lukjanow, and 
Dr. Rudolf Loewenthal of Machine Translation, Inc., the staff sug- 
gested that they submit a résumé of factual data relative to the 
capacity of Machine Translation, Inc., in comparison with the effec- 
tiveness of other machine translation programs, including an estimate 
of the time and funds required for the development of a production 
model of its unified transfer system (UTS). The following report 


was transmitted to the committee on April 4, 1960, in response to this 
request, for inclusion in this report: 


REPORT ON THE UNIFIED TRANSFER SYSTEM (UTS) 


Introduction 


Since 1954 extensive research in Machine Translation has 
been carried out by a number of universities and private or- 
eee in this country, England, Italy, and the Soviet 

nion. 

On August 20, 1958, a demonstration of the code matchin 
technique (CMT) took place on the premises of CEI 
(Corporation for Economic and Industrial Resezrch, Inc.). 
The system tested was conceived and developed by Ariadne 
Lukjanow, then at Georgetown University, 

The demonstration consisted of several articles of Russian 
chemical literature which were translated on an IBM 704 
computer. The National Science Foundation reported : “The 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


evaluation by chemists on the intelligibility and completeness 
of the text was positive.” 

In a subsequent demonstration on November 20, 1958, be- 
fore the International Scientific Congress, articles from other 
fields of knowledge were demonstrated using improved 
programs. 

These tests were not designed to produce perfect transla- 
tion, but primarily to nites that machine translation is a 
practical possibility. The CMT system is experimental and 
possesses too many practical limitations to render it useful 
except in a scientific sense. The decimal coding employed; 
the segmental individual operations; long fixed-length re- 
cords; complex linguistically oriented logic; extensive use of 
macroprograming and subroutines make the CMT a purely 
experimental model. However, the CMT not only proved 
that a system of this type yields translation and that the 
principles of the approach are valid, but provided us with 
the knowledge and experience necessary to produce an opera- 
tional system. 

The CMT experiment and the experience gained in pre- 
paring it were valuable for the developement of a production 
model system which we call the unified transfer system 
(UTS). 


Some Aspects and Present Status of the Unified Transfer 
System (UTS) 

Translation is a process of transferring one set of data 
into another. As applied to languages, it is a fourfold trans- 
fer process : 

1, The transfer of the function of words; 

2. The transfer of the form of words; 

3. The transfer of the meaning of words; and 

4, The transfer of the distribution of words. 
In the UTS these four transfers are considered as a single 
transfer process. In order to achieve this transfer, we have 
devised a classification system for each of the transfers 
expressed in the form of a code. We have then merged 
these codes into unified code patterns. 

The UTS is completely worked out and described in manu- 
script form. It is ready for programing and testing. The 
main features of the UTS are— 

1. The fourfold transfer process of translation (function, 
form, meaning, and distribution) has been transformed into 
one unified transfer process through the establishment of 
the relationship between the constituents of each transfer and 
and expressed in a 12-digit code. 

2. The individual codes are incorporated into 478 master 
code patterns. Each code pattern determines all possible 
meaning, form, functional, and environmental conditions for 
a classof words. Therefore, each word in the dictionary will 
carry a pattern reference number and the dictionary opera- 
tions are thereby greatly reduced. These master code pat- 
terns are to be stored as part of the program. 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


3. The operations in the algorithm are performed on the 
basis of relationships between the patterns rather than identi- 
fication of the individual constituents of either pattern. It is 
based on simple ideas of arithmetic progression and com- 
parison of codes. The logic of the system is more abstract, 
less linguistically oriented in its final mstructions, and there- 
fore more suited for machine operations. 

4. Maximum utilization of the computer is obtained by 
octal notation in codes, and the use of indirect addressing, 
buffering systems, simultaneous input-output, increased stor- 
age capacity and conversion instructions. 

5. In addition to the manuscript describing the UTS, a set 
of instructions has been prepared. It is ready for actual pro- 
graming and testing on a general purpose computer. 

6. A dictionary of 24,000 canonical entries (word stems) 
is ready for automatic conversion into a paradigmatic diction- 


ary. It is cross-referenced to code patterns, as required by 
the UTS. 


Capacity and application of the unified transfer system 
(UTS) 

The UTS has not been designed for any particular com- 
puter. It can be programed for any large general pu 
computer. We have estimated that on the IBM 709 this 
system would produce between 25,000 and 40,000 words per 
hour. On the IBM 7090 this number would be increased to 
more than 100,000 words per hour. On STRETCH it could 
be implemented with a translation rate of close to 3 million 
words per hour. 

In addition, the UTS is applicable not only to Russian- 
English but to other combinations of languages. We have 
already investigated its applicability to Russian-German, 
German-English, and ChineemKingilah) This system has all 
the necessary provisions for automatic indexing of trans- 
lated material and contains sufficient data for the devel- 
opment of an abstracting system. Both these processes, 
indexing and abstracting, can be incorporated into the UTS 
in such a fashion that they would not require any substantial 
increase of translation time. We have likewise investigated 
the possibility of simultaneous translation from one language 
into several on the same pass through the machine. We 
found it feasible and practical. 

All the work done on the UTS thus far has been accom- 
plished without any financial support from either the Gov- 
ernment or private sources. The programing and testing, 
as well as the compilation of large dictionaries, are entirely 
beyond the means of Machine Translation, Inc., both in terms 
of money and personnel. 

Provided sufficient funds were forthcoming, the Russian- 
English production phase of translation could be accom- 
plished within 6 to 9 months. Simultaneous translation 
could be achieved within 3 or 4 months. Indexing would 
require 3 or 4 months; abstracting, for which some research 
is necessary, would take from 6 to 8 months. Instead of 





245 










































































































































































DOCUMENTATION OF SCIENTIFIC INFORMATION 


working on these phases separately, we would be prepared 
and sy. Ara to work on them simultaneously. In that case 
all these tasks could be accomplished within a year. 

The UTS has several features: (1) Postediting: No post- 
editing will be required. Missing words in the text will 
appear in the translation as transliterated words. Those 
will, of course, have to be incorporated in the text. Since 
the size of the general dictionary is expected to consist of 
50,000 canonical entries (word stems), the missing words 
would belong to technological subject matter. As most of 
the Russians technical terms are derived from the Latin, 
Greek, or various European languages, they will present no 
problem. (2) Dictionaries: In the process of the develop- 
ment of the UTS, as well as during its operational use, it 
will be necessary to produce and/or implement large diction- 
aries in various fields of technical knowledge. Machine 
Translation, Inc., is prepared to compile and publish them. 


Estimated financial support required for the development of 
the unified transfer system (UTS) 

1. Russian-English translation: Dictionaries of 50,000 ca- 
nonical entries, i.e., more than 1,250,000 paradigmatic forms 
covering two subject matters chosen by the Government; pro- 
graming and testing of the UTS on the computer : $125,000. 
Only about one-third of this amount would be spent on labor 
and wages; almost two-thirds would be required for machine 
time and equipment. 

2. Multilingual translation: The UTS, once programed, 
could be adapted to several language combinations, either in 
sets of two languages or for simultaneous translation from 
one language into several languages. This task would de- 
mand an additional expense of approximately $25,000. Ad- 
ditional dictionaries for each combination of languages could 
be furnished at between $25,000 and $50,000 each, depending 
on the language, size of the dictionary, and subject matter 
involved, 

3. Indexing: By working on Russian-English translation, 
the indexing of the resulting translations could be developed 
and rendered without additional cost. 

4. Abstracting: The abstracting phase of the UTS has to be 
explored and its cost would be subject to further study. Pro- 
vided we received the funds ($125,000) for programing and 
testing of the translation phase of the UTS, the abstracting 
phase of the research and development would not exceed one- 
half of the original development cost; i.e., the abstracting 
phase would cost approximately $60,000, or could be done on 
the basis of cost-plus contract. 

5. Some remarks on the organization for the use of an op- 
erational system: In our opinion the primary objective of 
machine translation is to provide the U.S. Government with 
an efficient tool, i.e., an operational system for translating, 
indexing, and abstracting in the fields of technical and 
natural sciences. Once an operational system has been con- 
ceived, developed, and tested, an organization for its use 





DOCUMENTATION OF SCIENTIFIC INFORMATION 247 


should be established. Such an organization, by virtue of 
the problems involved, has to be a combination of computer 
installation and publishing house. It should be capable of 
producing and publishing translations, indexes, and abtracts. 
In addition, it should able to perform the following 
services: 

1. Efficient distribution of materials produced. 

2. Publication of bulletins at regular intervals 
(weekly, biweekly, and/or monthly) brmging to the at- 
tention of the Government and private industrial re- 
search organization the highlights of the available 
translated material in various fields of technical knowl- 
edge. 

3. Maintain an efficiently organized library. 

Only an organization capable of providing these facilities 
will properly utilize an operational system for machine trans- 
lation and render service to the U.S. Government, research 
and educational institutions, as well as to private industry. 

Machine Translation, Inc., is prepared to work with any 
computer installation assigned by the Government, although 
we would like to point out that the previous experience and 
knowledge acquired by the personnel of the Corporation for 
Economic & Industrial Research, Inc., would make the 
CEIR a desirable choice. We are further prepared to 
place the system and all our findings at the disposal of the 
U.S. Government, since service to the country of our adoption 
was the original purpose of our endeavors. 


McGraw-Hitzt Pusiisurne Co. 


The McGraw-Hill Publishing Co. has been in contact with the 
Committee on Government Operations practically from the beginning 
of its study of science and technological information procedures, 
nearly 3 years ago. 

The company has carried on a continuous study of programs for 
establishing technical and industrial data-processing centers, and 
has published a number of articles dealing with various aspects of 
the problems involved. 

In repsonse to a request from the staff, Mr. Curtis G. Benjamin, 
president of McGraw-Hill Book Co., Inc., submitted comments rela- 
tive to these problems for the information of the committee. Mr. 
Benjamin was specifically requested to give the committee the benefit 
of his views on the operations of the Bio-Science Information Ex- 
change (BSIE), and relative to the roposed establishment of a 
similar exchange service in the nhpdeat sciences under the adminis- 
trative jurisdiction of the Smithsonian Institution. His comments 
regarding this operation, which coincide with views expressed by 
others representing industrial organizations, are as follows: 


First, I can report that both of the McGraw-Hill Publish- 
ing Co.’s major operating divisions (the publications division, 
devoted to magazine publishing and industrial information 
and services, and the McGraw-Hill Book Co., devoted to book 
publishing and technical writing and translation services) 





248 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


are very actively exploring programs for establishing tech- 
nical and industrial data-processing centers that might make 
desirable and profitable adjuncts to our present operation. 
We are convinced that such centers should be natural and 
attractive extentions of our conventional methods and media 
for disseminating technical and industrial information. In 
fact, we are deeply concerned over the prospect that mech- 
anization systems and centers may soon displace our conven- 
tional methods and media in certain important areas and thus 
leave us standing high and dry in these areas. 

Though we have explored and debated several proposals 
and plans, we have not yet put a mechanized information 
system of any sort into operation. However, we do now have 
before us plans for two systems which we want to start 
just as quickly as we can see our way clear to go ahead with 
them. 

The first is a computer-based Russian-English lexicon (or 
word bank) of scientific and technical words and phrases—a 
mechanized, continuously updated system that will provide 
the largest and most comprehensive “dictionary” of scientific 
and technical terminology ever prepared in any language. 
We feel that this system would be useful and important in 
several aspects of our national science and defense programs. 

The second is another computer-based system for storage 
and retrieval of technical information on properties, char- 
acteristics, reliability, and availability of electronic compo- 
nents. We visualize this as a comprehensive centralized data 
center run by McGraw-Hill for the use of the electronics in- 
dustry and Government agencies on a subscription plan. It 
would, of course, be tied in with the interests of our Elec- 
tronics magazine. If successful, it could serve as an operat- 
ing pattern for similar information systems on mechanical 
components in other fields where McGraw-Hill is heavily 
represented by magazine and book publications. 

Coming now to another major point of your study, I must 
say that our thinking and planning in McGraw-Hill has been 
somewhat deterred by the developing pattern of Government 
preemption of this rapidly growing field. We have observed 
that most of the major systems established to date have been 
Government centered, either by direct or indirect planning 
and support. (This seems to have been both inevitable and 
justified under existing conditions.) We have also observed 
that, as your staff memorandum 86-1-65 suggests, many of 
the large Government departments and agencies have elected 
to start and maintain their own systems. Further, and more 
important to us, when Government agencies have decided to 
“farm out” system operation or system research and develop- 
ment, they have heavily favored nonprofit organizations—the 
professional societies, university research divisions, and such 
laboratories as Battelle. A survey of National Science Foun- 
dation grants and contracts in this area shows that almost 
all of them have gone to nonprofit organizations and here we 
are careful not to confuse the operators of systems and cen- 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


ters with the designers and producers of the “hardware” for 
the centers. In other words, it seems that the commercial 
firms that design and manufacture equipment are fully in the 
picture while commercial “communications” firms such as 
McGraw-Hill are left on the sidelines. Naturally, this makes 
us wonder whether there is a place for us in the picture, and, 
if so, whether we could ever make it a profitable one in com- 
petition with the many large nonprofit organizations which 
now have preferred positions. 

In response to a specific question in the staff memorandum, 
I must say that, notwithstanding its origin and history, the 
Smithsonian Institution operates as a U.S. Government 
agency. It is listed in the “U.S. Government Organization 
Manual,” it receives most of its current operating funds from 
the Government, and its publications are printed by the Gov- 
ernment Printing Office and distributed by the Superintend- 
ent of Documents. To anyone who claims that this is not a 
Government organization, I would ask the simple question: 
Then why do its publications qualify for printing by the 
GPO? 

Finally, I would like to make a personal observation on 
my experiences in serving in recent years on two Govern- 
ment committees that have been concerned with official policy 
on scientific information. The first of these committees was 
the Panel of Scientific Information, which made a report to 
the President’s Science Advisory Committee in 1958, which 
served as a basis for the President’s policy statement in De- 
cember of that year, and as a basis for the establishment, 
under title [IX of the National Defense Education Act, of a 
Science Information Council in the National Science Foun- 
dation. I was later appointed as a public member of this 
Council, and I am just now finishing my term on it. Both of 
these assignments have been very interesting and informa- 
tive experiences, and I can say that one of my major concerns 
in both was to try to keep the interest of private enterprise 
from being ignored or pushed aside. I do not intend to sug- 
gest that this neglect of private commercial interest in the 
national picture was either intentional or willful, but I do say 
that I had constantly to remind my colleagues that the in- 
terests of private industry should be taken into consideration. 
When reminded, my colleagues nearly always responded in a 
sympathetic and understanding manner, but I know from 
other experiences in Washington that many officials in Gov- 
ernment agencies are not so sympathetic to the interests of 
commercial organizations, 

Naturally, I wonder about the consequences of the estab- 
lishment of all these large informational enterprises in tax- 
exempt organizations and institutions. Obviously, it will 
mean a higher tax burden on the profit-making organizations 
that are left on the sidelines. Thus, the building of a struc- 
ture on one hand weakens the source of its support on the 
other. This is, of course, the way to “statism,” and it may 
be the way in which we must go in this area. But, if so, it 


249 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


should be pursued openly and with the full knowledge and 
consent of the Congress and the public, not by day-to-day 
administrative decisions and actions that may be contrary to 
the spirit if not the letter of public policy. 

I take it that this whole matter of public versus private 
interests will be an important nuance of your study. So I 
have offered my personal observations for whatever they may 
be worth in this connection. 


Ramo- W ooLDRIDGE 


In response to the staff's request, Mr. D. R. Swanson, Manager, 
Synthetic Intelligence Department, Intellectronics Laboratories, of 
the Ramo-Wooldridge Division of Thompson Ramo Wooldridge, 
Inc., submitted, on February 26, 1960, the following brief outline 
of the company’s work in the field of information retrieval : 


Thank you for your inquiry to Dr. Wooldridge and your 
interest in our work in the field of information retrieval. I 
shall outline here the general nature of this work and then ar- 
range to make available to you either in the form of discus- 
sions or written material any further details in which you 
may be interested. 

We are engaged in both research and development in the 
following areas: 

1. Several sizable efforts involving system studies of mili- 
tary data handling problems have been carried out. In each 
of these cases, problems of information storage and retrieval 
have been of major importance. A number of reports have 
been produced, most of which are available through Air 
Force or Army channels. The following three contracts are 
particularly relevant to these large scale system studies. 

(1) Design study for an integrated USAF intelli- 
gence data handling system (system 438L,)—Contract 
AF-30 (635 )-2867 (completed in 1957). 

(2) Subsystem I intelligence data processing—Con- 
tract AF-—30(602)-—1814. 

(3) Army data processing test facility—Contract 
DA-36—039-SC-80078. 

(2) Our investigations of practical problems of informa- 
tion retrieval for certain military applications led us to ree- 
ognize the importance of basic research in this area, and ac- 
cordingly for the past several years such research has been 
underway. This research is experimental in nature and ad- 
dressed to the relatively long range conceptual problems of 
fully automatic indexing, retrieval, and “correlation” of in- 
formation. Existing general purpose computing equipment 
is used extensively for experimental purposes. Hardware 
development is not a part of this particular program. 

I am enclosing a contract announcement by the Council on 
Library Resources which presents a brief outline of the goals 
of our work on automatic indexing. This first 9-month effort 
has been completed and a progress report covering this period 
is in preparation. Two interim progress reports have been 





DOCUMENTATION OF SCIENTIFIC INFORMATION 251 


submitted to the Council and are available for distribu- 
tion.®* A separate but related study on probabilistic ap- 
proach to indexing has also been carried out and a report 
prepared.’ 

3. We have under development what has thus far been a 
company-sponsored project to fabricate a device which per- 
mits direct searching of natural language information re- 
corded on microfilm text for purposes of information re- 
trieval and automatic indexing. The basic principles have 
been explored and the feasibility of developing a relatively 
inexpensive machine has been demonstrated ; a developmen- 
tal model will be completed in approximately 1 year. 

4. The R-W 400 polymorphic data processing system has 
been developed and has potentially important applications 
in the field of information viaiarval and particularly to com- 
plex and multifaceted data handling systems. This system 
contains an optional number and variety of functionally in- 
dependent modules. These communicate via a central elec- 
tronic switching exchange. Each module is designed, with- 
in practical economic and functional limits, to maximize 
system adaptability over a wide range of problem types and 
sizes. A detailed description of the R-W 400 appears in the 
January—February issue of “Datamation.” 

In addition to the foregoing areas rather directly related 
to problems of information retrieval, both at the applied and 
theoretical level, we are engaged also in numerous periph- 
erally related activities, such as research on the automatic 
translation of languages, which are relevant to problems 
encountered in any kind of a science information center as 
mentioned in your staff memorandum. 

With regard to your suggestion that we comment on other 
information retrieval systems with which we have some fa- 
miliarity, allow me here to make only the following remarks. 
I think it is useful to separate and identify at least initially 

(a) Conceptual problems; 
(6) Systems organization problems; 
(c) Hardware develoiinetite: 

Conceptual problems in the field of information retrieval, 
dissemination, and, in general, the communication of scien- 
tific information, can by no means be considered as solved ; 
much basie research is required. At the same time, I do not 
imply that progress toward practical solutions based on the 
known state of the art should be for any such reason delayed. 

The problems of sound system design in our opinion prob- 
ably transcend all others in considering a plan as ambitious 
as a centralized science information activity. The com- 
plexity of this area is such that further discussion in this 
letter would not be appropriate, but the nature of the task 
is such that the importance of item (c) (hardware) is prob- 
ably overshadowed by the importance of concept, philosophy, 
and approach. Rather than comment more specifically, let 





5 Word Correlation and Automatie Indexing Progress Report No. 1 C82-9U9. 
6 Word Correlation and Automatic Indexing Progress Report No. 2 C82—OT 1. 
7 Probabilistic Indexing Technical Memorandum No. 3 Data Systems Project Office. 








252 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


me express only the doubt that a major part of the answer to 
the total problem lies in any existing device. At the same 
time, it seems to me not unlikely that significant efficiencies 
could result from the use of various types of specialized 
processing equipment put together in a soundly conceived 
total system. 1 would urge, though, that the concept of a 
highly centralized national information service not be ac- 
cepted as a foregone conclusion, or as an immediate basis 
for planning and initiation of hardware developments. 

Should you desire further information on anything touched 
upon in this letter, please do not hesitate to let me know. 
We would, of course, be happy to discuss these questions with 
you and your committee at your convenience; it would be 
helpful in this connection if you could indicate which of the 
activities mentioned here are of particular interest so that 
we could then make available for such discussion the best 
qualified of our technical staff personnel. 


CONTRACT WITH THOMPSON RAMO WOOLDRIDGE, INC., FOR 
RESEARCH IN MACHINE INDEXING 


On August 10, 1959, the award of a contract for the first 
hase of an investigation into problems of mechanical index- 

ing and retrieval of information to Ramo-Wooldridge, a 
division of Thompson Ramo Wooldridge, Inc., of Los An- 
geles, Calif., was announced today by Verner W. Clapp, 
president of the Council on Library Resources, Inc. 

The council is a nonprofit organization established in 1956 
with a Ford Foundation grant to “assist in the solution of 
library problems.” All library work, Mr, Clapp pointed out, 
is based on procedures which like indexing and catalog- 
ing—make it possible to organize records and to secure infor- 
mation from them. To the extent that these processes can 
be effectively mechanized, the services of libraries might be 
improved. 

The proposed research program will include the recording 
in machine language (i.e., on punched cards, punched tape, 
or magnetic tape) of a small experimental library of scien- 
tific text. (circa 300,000 words). This library text will be 
“raw” —i.e., it will not have been previously organized, clas- 
sified, or indexed in any way. <A general purpose computer 
will be programed to search this text, in response to ques- 
tion formulated by scientific workers, with a view to dis- 
covering and printing out information relevant to the an- 
swers. 

A number of techniques have been devised for the search 
and to test its effectiveness. The measure of effectiveness 
will take into account not only the relevant, but also the ir- 
relevant information provided by the machine. To control 
the judgments of relevance and irrelevance, a group of ex- 
perts in the subject of the experimental library will fa- 
miliarize themselves with its entire contents and be able to 
perform direct searches as a check against the machine. 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


Similarly, several science librarians will also be able to check 
the machine, making use of traditional indexing methods. 

No prior distinction will be made between automatic in- 
dexing and text searching. Investigation of the text search- 
ing will be used as an approach to the study of automatic 
indexing. 

Some of the fundamental problems which will receive at- 
tention during this investigation are the following: 

(a) The inquirer-inquiree relationship. How does the in- 
quirer’s question get translated into language which the ma- 
chine recognizes and understands ? 

(6) The hierarchical relationship. How does the machine 
distinguish between general and specific (e.g., how does it 
know that cats are included in mammals, or that leopards are 
cats) ¢ 

(c) “The Patent Office problem.” Can the machine sup- 
ply information tomorrow on subjects which are not fully 
formulated or conceptualized today? How can it bring its 
ideas down to date? 

Among the techniques which it is expected to explore as 
bases for machine recognition of requested information are 
the following : 

(a) Simple specification of words which may be expected 
to be found in the informative response, and which the ma- 
chine should instantly recognize. 

(6) Specification of words rich in relevant subject con- 
tent—“key words.” 

(c) Extension of “key words” to include synonyms and 
close associates. 

(d) Specification of idiomatic phrases, cooccurrence of 
“key words” within sentences, etc. 

(e) Specification of a “word spectrum,” i.e., a quantita- 
tively expressed complex of words which may be matched or 
approximated—the machine’s success in approximation being 
measured in terms of a “correlation coefficient.” 

(f) Specification of low-frequency words (in the total 
library) having importance for meaning in inverse propor- 
tion to their frequency (e.g., the word “cosmotron” is more 
“anusual,” hence more informative, than the word “experi- 
ment.” 

(g) The preparation of “microconcordances” (lists of 
words and wo roups, plus context, characteristic of par- 
ticular subject fields) to aid inquirers in formulating their 
questions. 

The investigation, which will require 9 months, is expected 
to permit the formulation of a more comprehensive investi- 
gation involving a considerably larger experimental library 
and total effort. The study will be directed by Don R. Swan- 
son, whose primary recent concerns have been with digital 
computer applications involving business and military infor- 
mation, library information retrieval, and the direction of a 

yroject on the mechanical translation of languages. Dr. 
Noam Chomsky (Institute for Advanced Study, Princeton) 


54122—60——17 





253 














































































































































































254 DOCUMENTATION OF SCIENTIFIC INFORMATION 


and Dr. Paul L. Garvin (Georgetown University) will serve 
as linguist consultants on the project. In addition to mem- 
bers of Ramo-Wooldridge’s scientific staff who will be associ- 
ated with the project, the results of the mechanized infor- 
mation retrieval will be compared with that of traditional 
methods of indexing by two science/technology librarians, 
Mrs. Johanna A. Tallman and Mr. Donald V. Black (Uni- 
versity of California at Los Angeles). 


Recorpak Corp. 


In response to a request for information relative to the service now 
being provided to Government agencies, and suggestions as to con- 
tributions the company might make toward improvement of the Fed- 
eral programs for the retrieval of scientific and technological infor- 
mation, Mr. A. K. Chapman, president of Eastman Kodak Co., (of 
which Recordak Corp. is a subsidiary) , replied as follows : 


Your letter of January 11, 1960, asks about our work in 
developing and manufacturing equipment in the information 
processing field. We very much appreciate your calling our 
attention to your interest in systems for data recording, proc- 
essing, and retrieval. 

As I am sure you recognize, until we have a more complete 
understanding of the general problem it would not be pos- 
sible for us to make specific recommendations as to systems 
or equipment. However, we are active in this field and will 
be pleased to explore with you the possibility of our contrib- 
uting to the solution. One of our developments which may 
be applicable is the Recordak Minicard system for micro- 
filming printed pages or other information on small bits of 
film. On each of these bits of films there is also recorded a 
code that makes possible completely automated and very 
rapid sorting, collating, and retrieval. We are enclosing a 
brochure which gives a somewhat more complete outline of 
the Minicard System and Equipment. The Recordak Corp., 
a subsidiary of Eastman Kodak Co., has the sales and service 
responsibility for all of our equipment of this general type. 
Since your letter invites personal consultation, we are sending 
copies of this correspondence to Recordak and assure you 
that they will be contacting you soon, 

We suggest that for some parts of your problem the use of 
microfilm in rolls might be a useful tool, particularly now 
that a code system is available to expedite the finding of spe- 
cific images. Still another very new Kodak development 
which should not be overlooked in your search for informa- 
tion handling systems is the DACOM unit; an electronic unit 
which translates signals from magnetic tape into letters and 
numbers on microfilm. . 

The Recordak representatives will be pleased to discuss 
with you these several types of equipment and their applica- 
tions in the complex problem you are studying. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 255 


At the direction of Mr. Chapman, Mr. William E. Townsend, the 
Washington representative of Recordak Corp., submitted a number 
of company publittitions setting forth detailed information regarding 
its Minicard system, to which references have been made in other sec- 
tions of this report. These various volumes deal with the use of 
Minicard film record as a common-language medium ; descriptive out- 
lines of the “Minicard System in Operation”; “Basic Photography for 
Minicard System”; the “Application of the Kodak Minicard System 
to Problems of Dissemination”; and various other reprints relative 
to the utilization of the system. 

Elsewhere in this report it is pointed out that the Department of 
the Air Force has been utilizing the Minicard system on a very 
extensive scale. The operation has been in effect for over 114 years, 
and the staff was informed by the Eastman Kodak Co. that it now 
constitutes one of the largest operating mechanized retrieva] systems 
in the world in terms of number of documents in the files, but not in 
terms of cost or number of people. Officials of the company stated 
that the equipment has been more than adequate to meet their require- 
ments, and two additional Air Force installations are underway at 
other locations. Repeated staff requests to the Department of Defense 
for information relative to the operations and cost of this system were 
never complied with. 

CIA has been testing a set of Minicard equipment. According to 
representatives of Recordak moe the equipment has shown retrieval 
quality superior to the present Intellofax system. The problems of 
cost and capacity have not yet been fully explored. 

Following further conferences with representatives of the Eastman 


Kodak Co. and Recordak Corp., Mr. J. M. Arnold, president and 
general manager of Recordak, submitted the following additional in- 
formation relative to its services to the Federal Government, with 
supporting material. 


Even though the title of your request is quite specific, we 
would like to call your attention to a number of government 
programs that we have been associated with. Even though 
some may not fall in the purely scientific information field, 
we are confident that these programs will be of tremendous 
interest since they deal with document recording, storage, and 
retrieval. 

Enclosure A, a copy of the publication “Photo Methods for 
Industry,” contains an article (see p. 256) which briefly but 
completely describes the Census FOSDIC system and its use 
in the decennial census which is currently in progress. It is 
interesting to note that the millions upon millions of census 
documents which previously were handled in their original 
paper form will be recorded on some 50,000 rolls of Recordak 
microfilm which are being exposed on 25 precision Recordak 
cameras at the Census Jeffersonville inatall ations These rolls 
of film, when processed, will be ready by the FOSDIC scan- 
ner at a high rate of speed and the information will be re- 
corded on magnetic tape to be fed into computers. 

We are also pleased to have had a part in the Weather 
Bureau’s program involving some 350 million punchcards 
which are currently being filmed by the National Air Weather 
Center in Asheville, N.C. Once again, the film, when proc- 





256 DOCUMENTATION OF SCIENTIFIC INFORMATION 


essed, can be read at extremely high rates of speed by a differ- 
ent type FOSDIC scanner developed by the National Bureau 
of Standards. It is significant that each 100-foot roll of 
microfilm contains approximately 10,000 punchcards. En- 
closed with our letter, you will find a copy of the National 
Bureau of Standards Newsletter which describes this pro- 
gram. 

One of the bottlenecks in many document retrieval pro- 
grams is the limitation of high-speed printout. The Eastman 
Kodak Co. is developing, and the Recordak Corp. will market, 
a cathode ray printer which we call DACOM. The enclosed 
material describing DACOM states that its average operating 
speed is 16,000 characters per second. While this speed is 
many times faster than high-speed mechanical printers, it is 
well to note that DACOM has potential speeds of approxi- 
mately 100,000 characters per second which will allow it to 
operate on line or off line at speeds comparable to existing 
computers. , 

We believe that the cathode ray printer will play a tre- 
mendous part in EDP systems. As an example, the Social 
Security Administration in Baltimore utilizes such a device 
to create its plain language record on microfilm of more 
than 130 million active accounts. Social Security records 
are available in a matter of a few seconds through Rec- 
ordak’s Lodestar Reader, which is described in one of our 
enclosures. 

The Lodestar, with its magazine loaded film, offers new 
opportunities where rapid retrieval is essential. The Rec- 
ordak Corp. is working with the Army Ballistics Missile 


Agency, the Atomic coe? Commission, the Department of 


the Air Force, and the U.S. Patent Office to provide a system 
for automatic retrieval of documents discreetly identified by 
as many as 10 alpha numerical characters. These characters 
recorded at the time of filming the original documents will 
enable an operator to find any document within a roll of film 
in less than 12 seconds after keyboarding the request. This 
system will also have the ability to provide hard copy of se- 
lected documents within a few seconds. 


The supporting material referred to in Mr. Arnold’s letter, includ- 
ing a brief explanation of the Minicard system referred to in this 
report, follows: 

FEEDING FOSDIC 


An electronic marvel which turns census data into 
tape computer input, FOSDIC digests tons of micro- 
filmed information, provided the film is prepared 
within strict tolerances. Here is how Census feeds 
its delicate genius. 


Processing the tons of data collected during the 1960 census 
will be FOSDIC—film optical scanning device for input to 
computers. It is the brain child of Bureau of the Census and 
Bureau of Standards scientists. Under this system, data 
sheets marked by the enumerators will be photographed on 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


microfilm, then fed into FOSDIC. Out will come magnetic 
tape which will be fed into computers to put raw data into 
usable form. There is only one trouble with the arrangement. 
FOSDIC happens to be a feeding problem. 

Like many data-processing machines, FOSDIC can dis- 
tinguish by mark sensing between more dense marks which 
means something and less dense marks which mean nothing. 
However, unless the microfilm is processed carefully, mean- 
ingful marks will not show up strongly enough to let 
FOSDIC record them. The problem faced by Census is 
quality control of the microfilm input. 

The quality control problem revolves about one characteris- 
tic of microfilm, its density. If the film is too dense, the dif- 
ference between light and dark areas will be too slight for 
FOSDIC to distinguish. If the film is not dense enough, 
stray marks and smudges will be recorded. In short, the 
marks sensing will work only within a specified range of 
density differences. 

These limits, as determined by Densichron readings, range 
between 0.75 and 1.05. Ideal density is 0.9 +0.15. The 
responsibility for establishing a quality control system which 
would permit the maintenance of proper density, provide a 
method of checkiing all phases of production, yet not slow up 
the processing or exceed allowable costs, rests on Dr. Herman 
Fasteau, Chief, Quality Control Branch, Bureau of the 
Census. 

Actually, the 1960 FOSDIC is the third in a line of scan- 
ners developed by Standards and Census. The first model, 
in 1954, read data from sheets which had been marked with 
special pencils. The 1957 model scanned microfilmed punch- 
cards. In the current model, the raw census sheets them- 
selves are filmed and the film scanned. 

No comparable cost figures are available but the services 
of 2,000 punchcard operators used during the 1950 census will 
not be required. The appropriation for the 1950 census was 
$130 million ; in 1960, itis $110 million. Ifthe changing price 
level is considered, the savings are even greater, since the 
1960 dollar is worth somewhat less than the 1950 dollar was. 
In terms of noses counted, the cost was about 81 cents for 
160 million noses compared to 61 cents for about 180 million 
noses this year. 

For more important from the operational point of view is 
the great reduction in the possibility of human error and 
the speed with which census data can be made available. 
Census estimates that the first data series should be ready for 

ublication by early November, with future series published 
rom 6 to 18 months earlier than if punchcards had been used. 

The first stop for census data will be Jeffersonville, Ind., 
where the actual microfilming will be done. Formerly a 
quartermaster depot, the location offered advantages of avail- 
able working space and clerical labor supply. 

From Jeffersonville, exposed film will go to the Recordak 
labs in Washington, D.C., for processing, after which they 


257 

















DOCUMENTATION OF SCIENTIFIC INFORMATION 





will arrive at the Bureau of the Census, Suitland, Md., for 
feeding into FOSDIC. 

Planning for this operation began with the search for 
suitable paper, printing inks, and pencils for use with the 
census forms. The data sheets were designed so that the 
answer spaces were not superimposed, lest marks from one 
page accidentally mark up the page underneath. However, 
the major problems were photographic. 

Cameras chosen for the census project were Recordak mi- 
crofile film units, model D, using 16-millimeter film. There 
will be 29 units at Jeffersonville, 4 of which belong to 
Census, the remainder being rented from Recordak for the 
census. Of the 29, 27 will be used with 2 remaining on 
standby. 

Eight different emulsions were tested before a film was 
selected, Recordak Type A. One of the criteria was the 
readings obtained on FOSDIC. A maximum voltage was 
obtained when this film was scanned. 

Development, by Recordak in Washington, is being treated 
asa constant. Exposure time, based on the 1/42 second shut- 
ter speed of the Recordak camera, is another constant. The 
major variables affecting density are illumination and the 
time lag between exposure and development. 

Each operator will check the illumination at his camera 
with a photoelectric cell. Standard illumination is 60 foot- 
candles, but this may be varied to fit individual conditions. 
Each roll of film will be checked in Washington after it is 
processed and the quality evaluation sent to Jeffersonville. 
If the density falls below the 0.75 lower level, a higher light 
intensity will be used. 

Extraneous light which might affect readings will be con- 
trolled by proper use of curtains over windows and between 
camera positions. 

Latent image fade is a result of a time lag between expos- 
ure and development. With a reading of 60.5 foot-candles, 
density was found to drop from 0.95 to 0.87 in the period 
from 24 to 48 hours after exposure. Processing and illumi- 
nation are calculated on this time lag, so that the density 
loss will not fall below the acceptable lower limits. 

Of equal importance. is control of spacing, due to the op- 
eration of FOSDIC. The census forms have black boxes 
0.15-inch square which are index marks. The beam from 
FOSDIC starts at a box, then travels down the answer cir- 
cles, scanning each one and recording the data, skipping 
blank circles. 

The pages, which measure 173g by 1414 inches, are care- 
fully positioned under the camera in a specially designed 
holder. The spacing between exposures is set by Recordak 
technicians at 314 inches+1 inch. Exceeding these limits 
will result in skipped frames. A dip test, developing a short 
roll of 46 exposures from each camera at random intervals, 
is expected to reveal spacing defects before they become seri- 
ous and cause loss of recording time. 














DOCUMENTATION OF SCIENTIFIC INFORMATION 


The operating schedules are based upon an estimated one 
complete roll every 2 hours. Each roll will have about 900 
frames, but not every frame will be used, as data from an 
enumeration district will not be split between 2 rolls. A new 
roll will be started for each district. 

At the beginning of each roll is a series of four gray bars, 
used to compare density against the standard characteristic 
curve of the film. This curve is, ideally, a 45° line having a 
gamma of 1. 

To maintain the uniformity of film stock, only a few days’ 
supply will be kept at Jeffersonville. Raw stock will be 
stored at 70° to 75° F. 

Developmental work on FOSDIC covered a 10-year period, 
with E. i. Stein and M. L. Greenough playing prominent 
roles in the project. Each FOSDIC (there are now five, 
four in use and one standby) cost $125,000 to build, plus an 
estimated $250,000 for developmental and research costs. 

As of this writing, FOSDIC is the only machine of its 
type, according to the Census people, typical of the pionere 
work done by the Bureau in data processing. The first 
UNIVAC was made under contract at Census (it is kept at 
Suitland as an exhibit) and the needs of the census hastened 
the development of punchcards. 


THE MINICARD SYSTEM 


The Minicard system, a continuing development of the 
Eastman Kodak Co., is essentially an electronic-microfilm 
system for the unit record storage and single-search retrieval 
of documentary information. 

The Minicard system combines the mobility of unit record 
cards with the inherent ability of photography to compress 
graphic information at great reductions on tiny pieces of 
film. 

The heart of the Minicard system for information storage 
and retrieval is the Minicard film record itself. This unit 
record, measuring only 16 by 32 millimeters (approximately 
1% by 114 inches) has the ability to store document images 
of archival quality together with a machine-readable code 
for finding the information. The Minicard code is in the 
form of tiny black and white spots which are analogous to 
the holes in a punched card or paper tape, or magnetized 
spots on magnetic tape. The process of photographic reduc- 
tion is quite similar to conventional microfilming, except that 
the Minicard film and equipment are designed for more ex- 
treme reductions and much higher resolution of detail. 

The Minicard record has excellent information storage 
capacity. A completely coded Minicard record possesses 
more than five times the code capacity of a standard, punched 
tabulating card. In addition, the logic of the machines per- 
mits the handling of several related cards as though they were 
one unit record. The pattern of rectangular black and white 
spots which constitutes the Minicard code is in effect a ma- 
chine-readable abstract of the subject matter in the original 





259 































































































































































260 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


source material. This code can include information com- 
parable to the number, title, and author of a paper, titles of 
the various chapters, and a summary of the content of each 
paper. Any alphabetic or numeric character as well as 
straight natural language may be entered in the Minicard 
code field. Because so much information, including graphic 
information, can be entered on Minicard film records, new ob- 
jectives can be achieved in documentation. 

Three basic formats now are available for recording graph- 
ic material on Minicard film records: (a) either 12 images 
at 60 to 1 reduction of documents of legal page size, 814 by 14 
inches, or (6) 1 image at 38 to 1 reduction for maps or charts 
up to 18 by 22 inches, or (c) 1 image at 20 to 1 reduction for 
aerial photographs of 9 by 9 inch or 9 by 11 inch sections 
of a 9 by 18 inch aerial frame. 

The unusually high reduction ratio of 60 to 1 used for pho- 
tographing document pages provides a considerable saving 
in space. Each film image is actually 3,600 times smaller in 
area than the original document. The information storage 
density of reduced documentary images is tremendous (on 
the order of 80,000 characters per square inch for a docu- 
ment) and is infinite for a photograph. 

- Minicard film records can be manipulated readily in the va- 
rious machines. They are easily duplicated. Furthermore, 
the Minicard records for any document can be revised at any 
time. New pages can be recorded on another Minicard record 
and added to an existing Minicard group. It is also possible 
to add, delete, or even rearrange the coding during any dupli- 
cation step. 

A fully self-contained Minicard system consists of a num- 
ber of major and accessory pieces of equipment. Technical 
descriptions of these equipments and how they function in 
producing and handling Minicard film records are described 
in the enclosure. The methods and operations of a Minicard 
system are described in the following text in general terms: 


Preparing and recording documents 


Before documents are recorded on high-resolution Mini- 
card film, they are analyzed to determine subject index codes. 
Minicard code fields can accept a classification code such as 
the Universal Decimal Classification or the Library of Con- 
gress Classification as well as any other numeric or alpha- 
numeric indexing code. Where there is need to do so, nat- 
ural language may be entered as well as coded language. 
The index codes and descriptive data for a document are 
entered on a suitable form which is then attached to the 
document. 

Documents which have been analyzed are sent with the 
index data to a typewriter-tape punch operator for the 
preparation of punched paper tapes of the index data. After 
verification of the punched tapes, batches of documents, with 
the appropriate punched tape of index data attached, are 
sent to the camera for the recording operation. Index data 
from the punched tapes are input automatically to the camera 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


by means of a tape reader. The camera operator checks to 
see that document number and index data are properly 
matched while the tape is being read. This check is made 
easily by comparing the number on the document with the 
number displayed in front of the camera operator. The 
document is then placed on the camera easel for exposure. 

After exposure, the Minicard film is removed from the 
camera and then automatically processed, inspected for gross 
defects, and cut into individual master Minicard film rec- 
ords. The code of a master Minicard record then may be 
machine verified against the code on the original tape used 
in the camera. 

Organizing and maintaining the Minicard file 

Information storage and retrieval systems which have only 
one index entry per document require an examination of 
every index code for all items in the file during any search. 
Where the file is large, the time required to make a complete 
search may be unreasonably long. 

To minimize the long search problem for the Minicard 
system, the working file is set up on the basis of multiple 
entries, instead of a single entry, for each document. Mini- 
card duplicates—one duplicate for each previously desig- 
nated file code in the master Minicard record—are made for 
each document. Duplicates are sorted to separate sections 
of the file so that each file section is comprised of all cards 
containing a particular code (or, in some cases, combinations 
of codes). For most questions, the requirements of a search 
are satisfied if only a single small section of the file is pre- 
sented to the selector-sorter for search. The detailed search 
of a single file section is accomplished in a few minutes. 

The master Minicard records are also used for direct dis- 
semination of new material as it is received to provide a “cur- 
rent awareness” service. In accordance with standing re- 
quests, new material being input to the working file may be 
screened for specific information. Duplicates of appropriate 
Minicard film records are then produced automatically for 
dissemination directly to the requesters. 

The requester may have these expendable duplicate Mini- 
card records mounted for him in 3- by 5-inch aperture cards 
for his own personal file. These mounted Minicard records 
may be studied by him at any time in a viewer, or inserted in 
an enlarger-print processor which automatically produces 
enlarged, dry prints. 

The master Minicard records may also be duplicated to 
create an “insurance” file for storage in a remote area or in 
an alternate location and used in event of a disaster to re- 
create a new working file. 

After duplication, the master Minicard records are read to 
be sent to the “master file.” The master file is kept in order 
according to document accession number or accession date. 
Incoming Minicard records are merely added to the end of 
this file. 





261 

































































































































































262 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


The Minicard records in the working file are constantly 
used for searching to satisfy requests for information. After 
a search is complete and expendable duplicates have been 
made, the records are returned to the file. 

Using the Minicard file 

Requests for documents are received from individuals as 
well as from other document centers. Because a large num- 
ber of requests are received, each request is recorded on an 
appropriate form. The question data are coded and used 
for the preparation of paper tapes and the control panels 
which are to be used on the selector-sorter. The Minicard 
records to be searched are taken from the working file on the 
basis of the question data. Complex questions present no 
problem because all cross-reference codes are included on 
every duplicate Minicard record in the working file. Selected 
Minkeorimene ‘ds are duplicated at once and returned to the 
file section from which they were taken. 

Duplicates made from selected working file Minicard 
records are sent directly to a requester to fulfill his request. 
In those instances where prints of selected documents are 
requested, prints are made from the Minicard records on the 
enlarger-print processor. Since duplicates are always sup- 
plied i in response to requests for documents, a requester may 
retain the Minicard records as well as any prints which he 
receives. If an individual has a viewer svilable, he will in 
many instances keep Minicard records for his own personal 


file. 


Summary of Minicard film record advantages 

1. The Minicard system for storage and retrieval of in- 
formation is capable of accepting a large volume of informa- 
tion. In fact, a file containing millions of documents which 
have been adequately indexed can be handled with ease and 
efficiency. 

2. The reduction characteristic of the Minicard system can 
provide important savings in space, A volume comparison 
of the records alone (not ‘including filing and handling hard- 
ware) indicates that Minicard records reduce a document 
library in the ratio of 600 to 1, and a photographic file in the 
ratio of 200 to 1. 

Classified material can be stored in a much smaller space 
and can thereby reduce the cost of security precautions at 
both the regular locations and at emergency locations 
(alternate). 

3. The Minicard system combines the complete documentary 
material on the same unit record as the index for finding the 
information. This combination eliminates multiple- ste 
search and retrieval and replaces it with a single search which 
produces the material itself, not an address of wanted ma- 
terial. Information retrieved is immediately available for 


reference or for enlarging as hard copies for examination 
or detailed study. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 263 


4. The Minicard system is capable of organizing a file of 
documentary material into any desired number of cross-refer- 
ence categories so that the user need search but a small sec- 
tion of the entire file to answer even the most complicated 
request for information. 

5. Minicard files are inherently organized for growth, and 
can be updated. 

6. The Minicard system protects against loss of valuable 
source. material Documentary information retrieved is 
delivered as expendable, duplicate, Minicard film records in 
individual aperture cards or as hard copy enlargements. The 
Minicard working file is always complete, ready for the next 
search. 

7. The Minicard system has added another dimension to 
the concept of “common-language” systems. Most “common- 
language” systems are concerned with the transfer of in- 
formation in digital form from one machine to another— 
usually through media such as punched cards or paper tape. 
The Minicard system provides a film medium which not only 
carries digital information but also graphic data in the form 
of page images. 


(Nore.—The figures and photographs referred to in this summary 
have been incorporated in the files of the Committee. ) 


RemMincTron Ranp Unrivac 


After staff consultations with representatives of Remington Rand 
Univac, Mr. J. W. Schnackel, vice president and general manager, 
forwarded, on April 7, 1960, the following comments with respect 
to the general subject of information storage and retrieval systems : 


Remington Rand Univac has been active for many years 
in design, development, and manufacture of all kinds of data 
processing systems and equipment. Since 1948 many of the 
basic and original concepts in the field have been originated 
by Remington Rand Univac. 

The attached technical comments represent. the combined 
knowledge, experience, and philosophy of the engineering 
staffs of our military and commercial engineering activities 
at St. Paul, Minn., and Philadelphia, Pa. 

These comments cover the following broad areas: 

1. Total systems approach ; 

2. Systems management ; 

3. Remington Rand concepts of information storage 
and retireval systems; 

4. Remington Rand Univac equipments in production 
or under development which are applicable to informa- 
tion storage and retrieval problems. 

We appreciate the opportunity to provide this information 
to the Senate Committee on Government Operations. If 
additional information or elaboration of any of the state- 
ments included in the attached comments are desired we will 
be pleased to provide further details. 





264 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Total systems approach 


Our first remarks deal with what Remington Rand Univac 
believes are rather basic requirements of a total systems cia 
proach—a concept which many believe should govern the 
creation of any large information-handling system. 

Information storage and retrieval, as a field, can be thought 
of as a part of the larger field of information processing. 
Other subfields having equal stature with information storage 
and retrieval systems are data-processing systems, the use of 
which is well known, and control systems, wherein a com- 
puter is used to perform closed-loop control of a constantly 
dynamic process. 

Information storage and retrieval is probably the newest of 
these subfields to receive attention. Its foundation is based 
on information theory. It is also closely allied with library 
science in that the degree of sophistication of an index will 
help to determine the efficiency, size, and speed with which a 
given system will perform. ‘The very preliminary design 
aspects of such systems are based on a mathematical ap- 
proach. It has been found that statistical and probability 
theory as well as mathematical modeling techniques are very 
helpful in designing useful and efficient information-han- 
dling systems. 

There is much that still needs to be learned about informa- 
tion theory. However, some worthwhile steps have been 
taken and Remington Rand Univac believes that it is not too 
early to think in terms of first approaches to very large in- 
formation storage and retrieval systems. Because there are 
some unknowns, it is necessary to take a rather broad and 
cautious outlook toward the design and implementation of 
such systems. 

Listed below are some of the sequential steps which we be- 
lieve should be considered by a system user prior to and dur- 
ing the creation of a system, for solving an information- 
handling problem. There are several different types of ac- 
tivity necessary in the creation of a large information sys- 
tem. These activities are carried on by different kinds of 
people at different times. The following are the classic or- 
ganizational steps for creating a large information-handling 
system : 

A. Feasibility evaluation and system study ; 

B. Functional design of the system ; 

C. Detailed design of the system and selection of 
equipment ; 

D. Installation and operation of the system and train- 
ing of personnel ; 

E. Continuation engineering. 

Each of the above phases overlaps to a degree, so that com- 
munication between the participants in each phase is sus- 
tained. For instance, during the early phases the system 
analysts should always interpret the user’s ultimate needs. 
Not only present state-of-the-art equipment concepts but also 
the more advanced concepts must be considered. Equipment 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


specialists should function in an advisory capacity through 
communication with the analysts so that the analysts will 
provide definite equipment design leadership but always 
within present or reasonably anticipated equipment 
capabilities. 

An important aspect of the whole effort is for the analytical 
and design groups to constantly consider the impact on the 
system of a natural growth trend within the user’s environ- 
ment. A very careful estimate of the user’s expanding re- 
quirements in terms of loads and reaction times must be 
made. Likewise, as the system grows, expandability and 
compatibility of system components must be provided. 
Flexibility must be considered to as high a degree as possible, 
so that future changes in the user’s information handling 
objectives will not cause a serious impact on the user’s then 
existing system or operating organization. 

Another important requirement is that the implementation 
and operation of the system by the user should be on an evolu- 
tionary basis. Some related problems which must be dealt 
with in the early analytical] and design phases are: 

1. Controlled rates of system growth; 

2. Phased funding requirements; 

3. Development of interim systems over a period of 

years which allow the user to build up gradually to an 
optimum equipment configuration ; 

4. The necessity for gradual but sustained movement 
through the various phases from study to implementa- 
tion without a loss of continuity ; 

5. The necessity for modest but continued basic sys- 
tems research after the initia] study and design efforts 
have been terminated. 

Typical objectives and activities within each of the 
above-mentioned phases are discussed below. In general 
they apply to the design and implementation of any large 
information handling system. 

A. Feasibility evaluation and system study.—The object of 
this phase is a detailed delineation of system requirements 
based on: 

1. Identification of required functions and their inter- 
relationships ; 

2. Input and output loads; 

3. Timing requirements; 

4. Communication requirements ; 

5. Interim and future capacity requirements. 

The work of this phase should be yerlorined primarily by 
a team of mathematicians, psychologists, library scientists 
(for information storage and retrieval systems), and per- 
haps operations research people. 

B. Functional design of the system.—The object of this 
phase is a generalized concept of the overall system based 
on a detailed analysis of the results of phase A. The design 
at this point usually is highly definitive from ‘a functional 
standpoint and provides the total basis for later equipment 


265 





266 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


selection. The work of this phase should be performed by 
a smaller group of the same type of personnel as well as 
consulting groups of system and equipment engineers. 

C. Detailed design of the system and selection of equip- 
ment.—The object of this phase is to select the individual 
equipment to be used for the system based on the results of 
phase B. It is unwise to consider specific equipment in the 
system design before this time. A survey of available equip- 
ment might tend to show that no existing equipment will per- 
form the requirements as specified by the functional design. 
In such cases, new equipment research and development may 
be required or perhaps modifications in the system design are 
necessary. 

The work of this phase should be performed with the help 
of well-qualified systems and equipment engineers, with con- 
sultation from the system analysts who participated in prior 
phases of the effort. 

D. Installation and operation of the system and training of 
personnel.—The object of this phase is to accomplish an 
orderly transition of system philosophy, programing and op- 
eration from the designer to the user. The leadership for this 
phase should be provided by the designer. The work of this 
phase should be performed by a fairly large group of field 
people such as engineers, technicians, and instructors. 

E. Continuation engineering.—The object of this phase of 
the work is an extension of phases A and B, so that an itera- 
tive effort of analysis, design, and debugging is sustained on a 
modest basis but over a fairly long period of time. Such work 
should be performed by the original system designers in con- 
junction with the users and should be used for revision and 
updating of the system, with incorporation of new equip- 
ment concepts as computer technology progresses. 


Systems management 


As can be seen from the previous discussion, there are 
many problems connected with the design of large informa- 
tion systems. The results of premature equipment choices, 
improper systems planning, lack of comprehension of growth 
requirements, etc., can have a most adverse effect on the user’s 
operating capability and funding ability. On the other 
hand, the long-range benefits of large, well-planned infor- 
mation systems can be enormous. 

The design of such systems should be undertaken by an 
organization in Government or industry which has not only 
personnel who are trained in the pertinent disciplines, but 
which also has had experience in the design and manufacture 
of computing equipment. 

It is of equal importance that the organization be willing 
and able to provide the strongest possible management team 
to integrate all the available experience and resources within 
the organization to achieve successful completion of the 
system objectives. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


Remington Rand UNIV AC concepts of information storage 
and retrival systems 

For several years Remington Rand UNIVAC has had a 
group working full time on research in information retrieval. 
This work has been sponsored in part, by contracts with— 

(1) The Office of Naval Research ; 

(2) The Federation of American Societies for Ex- 
ae Biology, through the National Science 

oundation ; and 

(3) The U.S. Air Force, Office of Scientific Research. 

A. Federation proceedings.—One of the more recent prod- 
ucts of the research program at Remington Rand is the 
March 1960 issue of Federation Proceedings, which demon- 
strates for the first time that a computer can be used on a 

ractical basis for programing a large scientific meeting. 

ederation Proceedings also contains the first standard book 
index to be prepared by a computer (UNIVAC I). The 
2,526 papers presented at the annual meeting of the Federa- 
tion of American Societies for Experimental Biology, April 
11, 15, 1960, were indexed and programed in less than 8 con- 
secutive hours of computer time. This short time may be 
compared to an estimated 30 man-weeks of time spent on 
these activities had they been done manually. 

The index is of high analytical quality and is carefully 
cross-referenced. One of the more significant aspects of 
the federation project is that authors participated actively 
in providing data about how their papers should be pro- 
gramed and indexed. This bypassed any necessity for either 
human or machine extraction of what was important about 
a paper and made it possible to automate the indexing proc- 
ess, using human input that represented very high judg- 
ment value. The maehine has been used to analyze the nat- 
ural language input for “expressions of choice” and then 
standardize, organize and cross-reference the input data, 
The programs developed for this project are now available 
to other organizations through Remington Rand Univaec 
Service Bureau. 

B. ASTIA retrieval system.—Remington Rand Univae 
has been working actively with ASTIA to develop a retrieval 
system that will be efficient for their purposes. 

ASTIA is using the Remington Rand Univae solid-state 
computer for processing an average of 2,000 requests for 
documents per day. These are requests that specify the 
wanted documents by a particular document number, Proc- 
essing these requests is primarily an inventory control activ- 
ity. The information retrieval routines will be run on the 
same equipment during a different time period each day. 
The purpose of the information retrieval routines is to an- 
swer ASTIA’s average of 50 questions per day that require 
selection of all of the documents pertinent to a subject field 
stated by the inquirer. A typical subject search might be 
to locate all of the documents about the use of a particular 


267 





268 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


fuel in missiles. The initial size of the file to be searched by 
the automated system will be about 250,000 documents. 

The ASTIA retrieval system will be the most sophisticated 
system ever designed for practical application. It is sched- 
uled to be used as soon as tape components are available for 
the ASTIA equipment, or approximately July 1, 1960. It is 
expected that the system will allow ASTIA to perform in 
one machine retrieval run all of the subject searches for any 
day, in approximately 1 hour of computer time. 

The output of the retrieval run will be a card, identifying 
the search and giving the file numbers of all the documents 
pertinent to that search. In addition, there will be separate 
eards bearing the abstracts of each of the pertinent refer- 
ences. The output is produced by a high-speed printer. A 
highly significant aspect of the system is the fact that very 
specific questions can be asked, involving as many search 
terms as desired. In the event that the question is too 
specific, so as to retrieve too few documents, up to nine alter- 
native questions, or corollary questions can be asked as well. 

The system also provides a record of each search per- 
formed, in machine code (punched cards), with statistics 
about the search. Thus, reports about the functioning of the 
system can be prepared automatically, and analysis of the effi- 
ciency of the system can be made from the empirical data 
generated by the functioning of the system. <A generalization 
has been made from the flow chart of the ASTIA retrieval 
system that allows for the application of the same technique 


to many other retrieval problems, using many alternative 
configurations of equipment. 

C. Remington Rand Univae information retrieval re- 
search.—A leaner ey wd of information retrieval research 


papers produced by Remington Rand personnel is attached. 
General statements about current research are contained in 
the issue of the National Science Foundation publication. 

Remington Rand is currently consulting with other groups, 
such as the National Institutes of Health, the National Re- 
search Council’s cardiovascular literature project, the Chemi- 
cal Corps, the American Institute of Physics, the National 
Library of Medicine, the Institute for Advancement of Medi- 
cal Communication and numerous industrial firms, about 
their information retrieval problems. The reason so much 
of the consultation is in scientific areas is not only because 
there is a need for the automation of scientific communica- 
tion, but because the Remington Rand research staff has had 
broad training in the sciences. Subject training in the field 
of application of computers is an asset, and frequently an 
essential requisite of the system analysts guiding an applica- 
tion. 

D. Future studies and trends.—Because of the recent suc- 
cesses of Remington Rand Univac, in going from theory to 
practice in information retrieval, as described in the previous 













DOCUMENTATION OF SCIENTIFIC INFORMATION 


comments, we believe that now is the time to explore larger 
problems. A certain amount of trial and error will be neces- 
sary in arriving at any satisfactory automation of scientific 
communication on relatively general levels. It is the opinion 
of the information retrieval staff at Remington Rand Univac 
that generalized systems should be built cautiously from the 
raw materials at hand, such as the currently existing scientific 
indexing and abstracting services. 

One of the important functions of electronic equipment 
will be to coordinate the activities of decentralized work 
groups—to standaridze policy, to standardize methodology, 
to reduce duplication, and to fill gaps in the network of satis- 
factorily functioning parts of a communication chain. The 
field of military intelligence has much in common with the 
field of scientific intelligence. Remington Rand Univae 
plans to explore further the similarities and dissimilarities 
of the two fields. A great deal of classified research has been 
done already in the area of military control systems. 

Automatic programing (the use of artificial languages) has 
much in common with automatic translation of natural lan- 
guage. Both activities employ a basic tool of information 
retrieval, the structured lookup file, or thesaurus. Reming- 
ton Rand Univac has always been a leader in automatic pro- 
graming. Recently the company has been establishing it- 
self as a leader in information retrieval. The combination of 
research findings in the two areas may help to further re- 
search in machine translation, which will in turn be impor- 
tant to the organization of large communication networks. 

E. Application of Remington Rand Univae equipment— 
Remington Rand Univae has a keen interest in making its 
computer equipment compatible with the transmission and re- 
ceiving elements of communication systems. Currently it 
provides this facility with paper tape systems, such as tele- 
type tapes, by means of its Synchrotape typewriier. The 

ARC, described earlier, has a capability for producing 
microfilm copies of information it stores or produces. Ad- 
ditional and more significant developments of this sort are 
expected in the near iene 

he newer equipment produced by Remington Rand is be- 
ing made efficient for both information retrieval and data 
rocessing. The solid state computer and the LARC offer 
bit manipulation, rather than just word manipulation. This 
provides much greater freedom for performing the logical 
operations characteristic of information retrieval. It also 
makes possible, as mentioned for the ASTIA retrieval sys- 
tem, the performance of many kinds of information-handling 
activities by means of the same equipment. 


54122—60——18 


270 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


Remington Rand UNIV AC equipments in production under 
development which are applicable to information storage 
and retrival problems 

Remington Rand UNIVAC currently has in production 
or under development several basic information storage and 
retrieval systems and equipment; 

A. Military systems: 

(1) Military control system (classified): We have devel- 
oped and have in production a very large, high speed con- 
trol system. A number of online inputs are communicated 
directly to a very high-speed computer. This computer then 
evaluates the strategic situation and makes an optimum as- 
signment of target-weapon combinations. A rather high 
order of machine intuitive capacity operates within a pre- 
viously established framework of strategic and tactical 
doctrine. 

(2) Military intelligence analysis system (classified) : We 
have developed a large analytical system which is used to con- 
stantly track and plot all applicable vehicle movements over 
a large portion of the earth’s surface. The inputs are from 
many distant sources and the outputs are constantly dis- 
played at remote locations. 

(3) Military guidance systems (classified): We have 
developed two guidance systems for radio-inertial missiles 
which have the capability of target evaluation, missile 
launching, and guidance. Some of the computer program- 
ing concepts which were developed for these systems con- 
siderably advanced programing state of the art. 

B. Commercial data processing systems: 

(1) UNIVAC 120: Small scale computer for punched- 
card systems, featuring simplified “plugboard” programing. 
This computer is specifically designed to operate in conjunc- 
tion with a UNIVAC punched-card equipment installation. 
A catalog describing UNIVAC’s punched-card machines is 
enclosed. 

(2) UNIVAC file computer: The UNIV AC file computer 
is an electronic data processing system which features the 
simultaneous operation of a central, general-purpose com- 
puter with one or more large-capacity, random-access mag- 
netic drums and an integrated system of input-output units 
and other auxiliary devices. 

(3) UNIVAC II: The UNIVAC II system is an inte- 
grated, large-scale electronic system designed for automatic 
high-speed handling of alphabetic and numeric data. It is 
especially well adapted to handling a large volume of input 
and/or output on magnetic tape. A recording density of 250 
characters per inch and a tape speed of 100 inches per second 
are standard for reading into and recording from the 
computer. 

(4) UNIVAC scientific 1103A: The 1103A is a general- 
purpose, digital computer system for applications requiring 
large storage capacity, high operating speed, and great pro- 
graming versatility. Unusual logical features of the com- 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


puter facilitate maximum utilization of its very high inherent 
speed. In addition to performing large-scale calculations, 
the system is adaptable to a wide variety of applications, 
including simulation and control in real time. 

(5) UNIVAC 1105: The model 1105 computer is a general 
purpose, high-speed digital computer system incorporating 
the features of the UNIVAC scientific computer, 1103A, i.e., 
high internal speed, a large repertoire of instructions and a 
double-length internal register, together with the prerequi- 
sites of a data processing system, 1.e., large internal storage 
capacity and the ability to transfer large amounts of infor- 
mation to and from the external equipment. This combina- 
tion makes the 1105 a truly general-purpose computer system. 

(6) UNIVAC solid-state 80 or 90: UNIVAC solid-state 
80 or 90 computer systems are available for operation with 
magnetic tapes and punched cards as input/output. The 
UNIVAC solid-state 80 or 90 magnetic tape system and the 
UNIVAC solid-state 80 or 90 computer system operating 
from punched cards are general-purpose, digital computing 
systems capable of performing a wide range of data-process- 
ing tasks requiring large volumes of input/output data. The 
magnetic tape system is composed of the central processor, 
a tape synchronizer with up to 10 magnetic tape handling 
units, a high-speed reader, a read-punch unit and a high- 
speed printer: The units are interconnected to form a 
completely online, balanced data-processing system. The 
UNIVAC solid-state 80 or 90 computer system operating 
from punched cards is composed of the central processor, 
a high-speed reader, a read-punch unit and a high-speed 

rinter. 

' (7) UNIVAC LARC: The Remington Rand UNIVAC 
LARC is a general-purpose computing system designed to 
solve a wide variety of problems that are beyond the range 
of current large-scale systems. It is both a business and 
scientific data-processing system. In fact, this new Reming- 
ton Rand system removes the line, established previously in 
both computer design and applicability, that has divided 
scientific and business data-processing systems. The elimina- 
tion of this division represents a tremendous advance, since 
scientific principles continue to be integrated into the solution 
of business and industrial problems at an ever-increasing 
rate. 

The UNIVAC LARC incorporates many modern electronic 
data-processing advances including modular construction, 
large data storage, versatile input/output, solid-state cir- 
cuitry, “time-shared” operation, and extremely fast computa- 
tion. 

C. System component developments: 

(1) Fast memories: We are currently developing small 
random access memories with access times which are an order 
of magnitude faster than the fastest known memories avail- 
able today. The switching speeds at these devices are ex- 
tremely high. 


271 





272 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


(2) Random access file: Currently under development is a 
large random access file. It is visualized that such a device 
when applied to a system could store in excess of 1 billion 
bits of information. A number of such devices, controlled 
by a large central processor, could be used in a number of 
information storage and retrieval systems. 


BIBLIOGRAPHY 


AUTHORSHIP 


Claire K. Schultz. “Punched Cards in Libraries.” Presented to the 
Science-Technology Group, Special Libraries Council of Philadel- 
phia and Vicinity, Sept. 25, 1951. Published: Bulletin of the Special 
Libraries Council of Philadelphia and Vicinity, April 1952. 

Claire K. Schultz. “Mechanized Punched Card Systems for recording 
and Searching Literature.” Presented at the Special Libraries As- 
sociation 43d Annual Convention, May 26-29, 1952. 

Claire K. Schultz. Thesis: “Coding Literature on Punched Cards.” A 
study submitted in partial fulfillment of requirements for the de 
gree of master of science in library science, Philadelphia, June 
14, 1952. 

Claire K. Schultz. “A Working Application of the I.B.M. 101 Elec- 
tronic Statistical Machine for Literature Searching.” Presented at 
the Annual Meeting of the American Documentation Institute, 
Cleveland, Ohio, Nov. 4—5, 1954. 

Claire K. Schultz. “An Application of Random Codes for Literature 
Search,” in Perry and Casey, Punched Cards. 2d ed., 1958. 

Claire K. Schultz and John J. O’Connor. “Designing More Efficient 
Indexes.” UNESCO Library Bulletin, Nov.—Dec. 1959, 

Claire K,. Schultz and Clayton A. Shepherd. “Directions in the Re- 
trieval of Scientific Information: A discussion of the Evolutionary 
Trends Within Secondary Sources.” Remington Road Report. 

Claire K. Schultz and Clayton A. Shepherd. “A Computer Analysis of 
of the Merck Sharp and Dohme Indexing System.” To be published. 

Claire K. Schultz. “Limits of Mechaniaztion in Small Applications.” 
Presented to the American Documentation Institute, Annual Meet- 
ing, Oct. 22-24, 1959. 

Claire K. Schultz and Clayton A. Shepherd. “The 1960 Federation 
Meeting: A Study in Programing and Indexing by Computer.” 
To be published in Federation Proceedings, July 1960. 

Claire K. Schultz. “A Generalized Computer Method for Information 
Retrieval.” To be submitted to Journal of the Association of Com- 
puting Machinery. 

John J. O’Connor. “Information Retrieval by Univac and by Univac- 
Produced Non-mechanized System.” Technical report No. 18. 
Remington Rand Univac, Philadelphia. 

John J. O’Connor. “The Scan Column Index: A Bound Book Coordi- 
nates Index.” Technical report to be issued by Remington Rand 
Univac, Philadelphia. 

William J. Turanski. “Basic Concepts of Information Retrieval.” 
Part I, Remington Rand Technical Report 17. November 25, 1957. 










DOCUMENTATION OF SCIENTIFIC INFORMATION 273 


ENCLOSURES 
(Incorporated in committee files) 




















U-1645 Reliability in Data-Processing Systems 
U-1363 
(Rev.2) Functions of Remington Rand Univac Data-Processing 
Systems 
U-1562A Features of the New Univac File-Computer Data-Auto- 
mation System, Model I 
Univac 120 
U-1578 Electronic Data-Processing System 
U-1616 Univac 1105 System 
U-1797 
(Rev.1) Univac Solid-State 80 Magnetic Tape System 
Pbx 75012 Systems Management: The total solution of a problem 
through professional depth and physical plant 
Military Engineering Division. 
Mass Storage—Technical Developments 


U-310 

(Rev.2) Univac Electronic Data-Processing Systems for the U.S. 
Government 

U-676 


(Rev.2) Remington Rand Univac-Punched-Card Accounting Ma- 
chines for use by the U.S. Government 

























Samira Kure & Frencu Laporatortres (SK&F) 


In response to the staff’s request for information relative to the de- 
velopment of machine systems for information retrieval, Mr. Henry 
C. Longnecker, manager, science information department, and Mr. 
Robert Hayne, head, documentation section, of Smith Kline & French 
Laboratories, joined members of the staff in a conference held on 
February 4, 1960. At this conference, Mr. Longnecker submitted to 
the committee staff the following summary of the activities of the 
science information department : 


The science information department of Smith Kline & 
French Laboratories was established to cope with problems 
which confront most research organizations, whether they be 
in government, academic, or industrial circles. These prob- 
lems have to do with the availability and use of recorded 
scientific information—published and unpublished—as a re- 
source in research. 

As we all know, the volume of published scientific material 
is now so large that present methods of handling it, including 
acquisition, analysis, and storage, seem inadequate for either 
alerting scientists to new developments or for effective re- 
trieval of information. In addition, research organizations 
of any size generate a great deal of information which, though 
it may not reach publication, likewise requires some kind of 
manipulation na organization for analysis and later re- 
trieval. However, the bench or clinical scientist usually has 
little enough time to read current publications in his im- 
mediate field; much less to be able to find time to assume 
responsibilities for active work with information. As a re- 
sult, work with information has usually been left to indi- 
viduals who either lack adequate subject background, or 
who are so removed from the organization’s research efforts 














DOCUMENTATION OF SCIENTIFIC INFORMATION 





that they are unable to provide them with creative support. 

SK&F has organized its information facilities so that they 
can play a positive role in research and development planning 
and implementation, rather than a purely secondary, sup- 
porting role. This approach requires the use of highly 
trained scientists who are intimately conversant with the 
scientific and clinical areas of company interest. 

Although we at SK&F have pioneered in certain applica- 
tions of machines to information work and are well aware of 
their potential, we have attempted to keep the machine and 
the system in proper perspective as tools of the scientifically 
trained information specialist. We feel that an information 
system can only be as good as the scientific competence of 
those who use it. The work of the science information de- 
partment, briefly described in this report, deals with acquisi- 
tion, screening, and dissemination of material, and with its 
analysis, storage, and retrieval. The manual and machine 
systems used in these operations have been designed, not. as 
an end in themselves, but as methods of promoting critical 
evaluation and creative use of information by trained, ex- 
perienced scientists who devote full time to information 
activities. 

Departmental organization—The organization of the sci- 
ence information department is shown in the attached chart. 
Information processing—acquisition, screening, dissemina- 
tion, analysis, storage, and retrieval—is carried on in the 
documentation section. Critical evaluation and creative use 
of information is the job of senior scientists on the Ph. D. 
level in the physical sciences, biological sciences, and statisti- 
cal sections. 

Acquisition, screening, and dissemination.—As in most or- 
ganizations, acquisition of published material is handled by 
the library, one of the units of our documentation section. 
A large part of the material received consists of periodical 
literature. Obviously no bench scientist could hope to find 
time to scan any sizable fraction of the 600 journals we re- 
ceive, assuming he could read all the languages involved. 
Instead, a group of junior scientists in the documentation 
section screens all journals for information pertaining to ac- 
tive research and development projects, or to interests of in- 
dividual scientists in any part of the company. The effec- 
tiveness of this screening is obviously in direct proportion to 
the group’s knowledge of what is going on around them. 
They rely for the most part on frequent person contact. with 
members of research and development teams. In addition, 
all formal reports on the status of research and development 
projects come to their attention. 

Dissemination of pertinent material is accomplished in 
several ways. First, material of specific interest to a team is 
routed to the SID team member in the physical sciences or 
biological sciences section. Next, material of general in- 
terest, whether or not directly related to current research and 
development projects, is abstracted and indexed for one of 








DOCUMENTATION OF SCIENTIFIC INFORMATION 


several widely circulated internal information bulletins. 
When the nature of the material warrants it, the junior scien- 
tist may contact the responsible senior personally. 

For the most. part, we do not use machines in processing 
the published literature for dissemination or for later re- 
trieval of information. Our mechanization is limited to the 
handling of internal raw data. For access to the current 
literature we depend upon the internal bulletins and indexes 
mentioned above. For retrieval of older data we must de- 
pend upon the use of the standard abstracting and indexing 
services, such as Chemical Abstracts, Index “Medicus, Bio- 
logical Abstracts. 

Preparation of any kind of information for machine han- 
dling is enormously time consuming. As seems to be the case 
with most private organizations, it 1s beyond our resources to 
prepare any sizable portion of the published literature for a 
machine system that would give us a substantially better re- 
trieval than we have through use of the standard services. 

We have seen a number of mechanical systems for storage 
and retrieval of information from the literature, and all are 
limited in coverage by the labor required to prepare mate- 
rial for the system. The use of computers for this purpose 
makes the problem of input an even more difficult one. 
Literature systems using computers are for the most part 
systems designed originally for simpler sorting machines— 
or even for manual files. To our knowledge, none takes 
true advantage of the computer’s capacity for manipulation 
of information because it is generally found to be economi- 

cally impractical to prepare “information in a form which 
can take advantage of that capacity. We believe that effi- 
cient and economic use of machines for control of the litera- 
ture will come only with the solution of problems basic to the 
organization of information and to its analysis—and that 
this is, today, the primary problem which must be solved 
before machines in themselves can be used for more than 
simple retrieval of documents from collections of relatively 
limited size. 

Analysis, storage, and retrieval of internal data—The 
science information department is responsible for collection, 
analysis, and storage of the large amounts of internal labora- 
tory and clinical data generated by research and develop- 
ment project activities. A large part of this information is 
coded for punched card files which can be scanned rapidly 
by machine. At the present time we analyze and code for 
mechanical retrieval data on the physical properties, chemical 
structure, and biological effects of chemical compounds tested 
in the company’s laboratories. In addition, the system in- 
cludes data from case reports submitted by’ outside investi- 
gators working with new drugs. 

Critical evaluation and creative use.—The routine infor- 
mation processing described above is largely in the hands 
of junior scientists on the M.S. level. Junior scientists also 
staff a literature searching group, which acts as a clearing- 


275 





276 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


house for information requests from any area of the company. 
The work of these junior scientists supports senior scientists 
on the Ph. D. level, who, as members of our research and de- 
velopment teams, are responsible for critical evaluation and 
creative use of information in furthering the teams’ activi- 
ties. The necessary preoccupation of the bench or clinical sci- 
entist with project work, and his limited knowledge of infor- 
mation techniques called for this new member on our profes- 
sional staff—namely, the senior information scientist, a 
scientist with the same order of professional qualifications as 
his laboratory and clinical associates, 

All of the technical information, experimental and clinical, 
accumulated on a project is passed along to the SID represen- 
sentative on the team. In integrating and evaluating this 
information he calls upon the services of the machine data 
peneenng group, as well as his own scientific background, 

is skills in analyzing and interpreting data, and his ability 
to communicate the results to hom All of these facilities 
put him in an excellent position to recognize information of 
importance that might otherwise be overlooked. 

Compared to the benchman whose reading must be limited 
to specific areas, the literature scientist is exposed to a wide 
spectrum of information. Thus, with a realization that the 
key to the solution of many problems may lie buried in the 
literature or in already accumulated laboratory data, he is 
in a most advantageous position to develop new ideas and 
solutions. This potential is not limited to the project team 
members, but applies to all senior and junior scientists who 
deal with information. 





277 


DOCUMENTATION OF SCIENTIFIC INFORMATION 








SuTTpuey UoTyeuUIoFUT Jo seseud 


GSN ZAILVSNO GNV NOIIVATVAR TVOILIM °€ 
NOILVNINGSSICG GNV ONINAGUOS °2 


TIVARIULAY UNV ALVHOLS *NOILISINDOV °T 





NOI LOgS NOILOUS NOI LOas 


NOI LOGS 
NOL LV LNAWNDOd 


SHONAIOS TVOISAHd SOILSILVLS SHONAIOS ‘TVOIOSOIOIG 


om LNAWLYVd ad | 


LNGNLYVdad NOILVWHOANI FONAIOS 


NOISIAIC INGWdOTSAGC ANY HOUVASTY 
SAHTYOLVHORVI HONDYA % ANITY HLIWS 


278 DOCUMENTATION OF SCIENTIFIC INFORMATION 


Sranrorp Researcu Instrrure (SRI) 


On March 10, 1960, in response to a request from the staff, Mr. 
William D. McGuigan, assistant director of engineering research, 
SRI, forwarded the following comments: 


I have delayed answering your letter regarding Stanford 
Research Institute’s activities in the information retrieval 
field because rapid developments in this subject justify a 
reorientation of our views and programs. 

First, it should be clear that Stanford Research Institute 
does not have any special information retrieval system. We 
have rather worked on many systems and many components 
for systems which are either used by or available from our 
various clients. None of these systems, however, appear 
to solve the problem of a scientific information retrieval 
center. 

Secondly, you should be aware of some brilliant and sur- 
prisingly simple techniques for language translation which 
have developed recently and which make it possible to trans- 
late Russian documents with only one large computer. The 
man who can explain this and to whom a great deal of credit 
is due is Mr. Paul Howerton, Deputy Assistant Director of 
the CIA. The essence of his story would be that “we know 
how to translate all of the documents we can get our hands 
on and the process involves an IBM 705 system translating 
50,000 words per hour from Russian to English.” 

While the scientific techniques for this system of transla- 
tion are known and being used, much remains to be done to 
improve the system. Specifically, machines are needed for 
automatic reading of the documents to eliminate or reduce 
the need for key-punch operators at the computer input. 

A more fundamental problem, however, concerns the com- 
puter output with the question “What is it that we want from 
these systems?” or “What kind of information do we need ?” 
The tendency is to process everything with the assumption 
that someone might find it of interest. Our library, there- 
fore, becomes cluttered with a great deal of information 
which is either redundant or useless. Thus, research is 
needed to improve the finesse and efficiency with which we 
use our libraries. 

With specific regard to your comments about NSF and its 
appropriateness in this type of effort, it should be noted that 
NSF is much too limited in its scope to get anywhere on a 
problem of this type. The reasons are many, but primarily 
they lie in the fact that NSF is concerned only with the basic 
research part of the research spectrum, and supports in this 
portion of the spectrum only a few individuals on a grant 
basis. Information retrieval is an excellent example of a 
subject which requires applied research, done by larger 
groups, with more depth of personnel, laboratory equip- 
ment, and research management than is supported b the 
NSF system. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 279 
Western Reserve Universiry (WRU) 


As requested by the staff of the committee, Mr. Allen Kent, asso- 
ciate director, Center for Documentation and Communication Re- 
search, School of Science, WRU, supplied the following informa- 
tion relative to the mechanized information retrieval program being 


developed by WRU: 


As you requested, I have prepared a brief outline of our 
mechanized information retrieval program for use in con- 
nection with your study of science information processing 
systems in the United States. 

I am enclosing two additional items® that you may find 
useful in your deliberations: 

(1) “Machine Literature Searching in Science,” a 
paper that I presented before the American Associa- 
tion for the Advancement of Science. 

(2) My book, “Centralized Information Services— 
Opportunities and Problems,” which provides a goodly 
amount of survey data in this field. 


MACHINE LITERATURE SEARCHING AT THE CENTER FOR DOCU- 
MENTATION AND COMMUNICATION RESEARCH, WESTERN RE- 
SERVE UNIVERSITY, CLEVELAND, OHIO 


(By Allen Kent, associate director) 


The Center for Documentation and Communication Re- 
search was established in 1955 at Western Reserve University 
as the research arm of the school of library science. The 
overall long-range goals of the center have been the develop- 
ment and application of methods, equipment, and techniques 
that will facilitate the communication and utilization of 
recorded knowledge. The center is carrying out a four- 


pronged program of education, research, liaison, and opera- 
tional services in order to achieve its objectives. 

The basis for experimental, pilot and oper: ational machine 
searching activities has been laid in the development of a set 
of principles for coping with scientific and technical litera- 
ture, and which by extension have appeared to be capable of 
extension to the fields of law, history, and economics. 


8 Made a part of the committee’s files for reference. 
® For a review of the work of the center, = = following books: 
(a) Shera, J. H., Allen Kent, and J. W. Perry (eds.), “Documentation in 
Action,” Reinhold Publishing Cor ~ aw York, 1956. 
(db) Peakes, Gilbert. Allen Kent, and J. W. Perry (eds.), “Progress Report 
in Chemical Literature Retrieval,’’ Interscience Publishers, New York, 
1957 


. ‘. 

{c) Shera, J. H., Allen Kent, and J. W. Perry (eds.), “Information Systems 
in Documentation,” Interscience Publishers, New York, 1957. 

{d) Perry, J. W., and Allen Kent, “Documentation and Information Retrieval: 
An Introduction to Basic Principles and Cost Analysis,’’ Interscience 
Publishers, New York, 1957. 

{e) Perry, J. W., Allen Kent, and Madeline M Posed, “Machine Literature 
Searching, * Interseience Publishers, New York, 1 956. 

(f) Shera, J. Allen Kent, and J. W. Perry (eds.), “Information Resources— 
A Challenge to American Science and Industry,” Western Reserve Uni- 
versity Press _— Interscience Publishers, New York, 1958. 

(g) Casey, ks J. Perry, M. M. Berry, and Allen Kent (ods) “Punched 
Cards,” Sa eittion Reinhold Publishing Corp., New York, 1958. 

dh) Kent, Allen, and J. W. Perry, “Centralized Information Services,” Western 
Reserve University Press and Interscience Publishers, New York, 1958. 








280 


10 See, J. W. Perry and Allen Kent, “Tools for Machine Literature Searching,” Inter- 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


The principles developed provide a number of degrees of 
flexibility and corresponding degrees of control over and 
above what is obtainable with traditional information re- 
trieval systems. The procedures make it possible to record 
information of interest that is virtually unlimited in degree 
of detail. In practice, the information contained in a de- 
tailed, informative abstract is selected for subsequent encod- 
ing and availability for machine searching.’° 

The center is applying and has applied the literature 
searching principles that it has developed in a number of 
programs: 

(1) Metallurgy (sponsored by the American Society for 
Metals): An operational machine searching service is now 
being offered widely to industry and government." The core 
metallurgical literature from throughout the world is ab- 
stracted, encoded, and searched on a subscription basis. 

(2a) Inorganic chemistry. 

(2b) Solid state physics. 

(2c) Mechanicial engineering. 

(2d) Geology: NSF has provided a grant of $159,200 for 
the first year of a 214-year program entitled: “Test Program 
for Evaluating Procedures for the Exploitation of Literature 
of Interest to Metallurgists.” Part of these funds will be 
used to augment the analysis-encoding-searching program be- 
ing conducted for the American Society for Metals. Addi- 
tional literature in subjects impinging on metallurgy, e.g., 
inorganic chemistry, physics, mechanical engineering and 
geology, will be analyzed and encoded. Patents, Govern- 
ment reports, company reports, technical-trade publications, 
and dissertations will also be encoded, in order to augment the 
literature obtained from periodical publications. The first 
report on this project entitled “Exploitation of Recorded 
Information. 1. Development of an Operational Machine 
Searching Service for the Literature of Metallurgy and Al- 
lied Subjects,” was submitted to the committee by reference. 

(3) Diabetes: A grant has been received from the Ameri- 
can Diabetes Association for the development of a system of 
abstracting, encoding and machine searching of the diabetes 
literature. 

(4) Disease vector control: A grant has been received 
from the National Institutes of Health for the development 
of a system of abstracting, encoding, and machine searching 
of the literature in this field. 

(5) Electrical engineering : The center has engaged in a co- 
operative program with the Applied Research Laboratory of 
the University of Arizona in the abstracting, encoding, and 
machine searching of literature in electrical engineering and 
related fields. The University of Arizona is prime contrac- 
tor for this effort with the U.S. Army Electronic Seeing 
Ground, Fort Huachuca, Ariz. Our center has cooperate 


science Publishers, New York, 1958 
1 Se 


ewsletter No. 8, “Center for Documentation and Communication Research,” 


Western Reserve University, Cleveland, Ohio, December 1959. 








DOCUMENTATION OF SCIENTIFIC INFORMATION 281 


with Arizona’s Applied Research Laboratory by providing 
training in methods and by participating in solving various 
coding problems. 

(6) Law: The center, in cooperation with the School of 
Law of Western Reserve University, is encoding for a ma- 
chine searching demonstration and test, a portion of the Uni- 
form Commercial Code (U.C.C.) of Ohio. Also included 
are a number of cases which have a direct bearing on various 
sections of the U.C.C. In addition, a conventional subject 
index will be prepared for the same body of material so that 
comparative searches may be conducted. 

(7) Ordnance: In a project conducted for development 
and proof services, Aberdeen Proving Ground, procedures 
were developed for the preparation of telegraphic abstracts 
of ordnance reports.’** The basis for the extension of the 
semantic code dictionary to the field of ordnance has been 
laid in the program carried out by the center for the Ord- 
nance Engineering Design Handbook Office at Duke Uni- 
versity, Durham, N.C. This program has resulted in the 
compilation of a glossary of ordnance terminology,” and of a 
file of analyzed terminology ready for code establishment. 

(8) Economics: A test sample of 1,000 articles was selected 
by scanning the New York Times Index for 1955, under 
headings likely to yield information of interest to the metals 
industry. The articles selected were analyzed and encoded 
using procedures analogous to those developed for the Ameri- 
can Society for Metals. 

Experimental searches have been conducted to illustrate the 
feasibility of the automatic correlation of commercial in- 
telligence materials. 


Machine searching equipment 

The projects described above have been based on literature 
encoding and searching principles which made it advantage- 
ous to develop a special-purpose computer—the Western 
Reserve University searching selector. The high-speed ver- 
sion of this equipment—the GE-250—is now being con- 
structed by the General Electric Co, computer department. 


Operations 


The decision was reached during 1958 that the center 
should work toward the creation of a model center for the 
mechanized exploitation of scientific and technical litera- 
ture. Until that time the functions of the center had been 
limited to research and development activities. However, it 
was felt that in order to assure continuous enrichment of the 
basic research program of the center it was necessary to 
engage in the processing and exploitation of a significantly 
large corpus of literature. This activity would provide the 
necessary research materials and also the facilities which 


12 Allen Kent and J. W. Perry, “New Indexing—Abstracting System for Formal Reports, 


n 
Dovaia eee Proof Services, Aberdeen Proving Ground,” American Documentation, 


18 “Glossary of Ordnance Terms,” preliminary edition, Ordnance Engineering Handbook 
Office, Duke University, Durham, N.C., June 1959, 323 pp. 





282 


DOCUMENTATION OF SCIENTIFIC INFORMATION 


would permit a more direct relationship between the edu- 
oaierals and research functions through the medium of in- 
ternships. In this way the developing model center might 
be considered in the same relationship to the school of library 
science as a university hospital is to its school of medicine. 

Operational searching will be based on the GE-250, the 
high-speed version of the WRU searching selector, which the 
center has on order from the computer department of the 
General Electric Co. 


Basic research 


The center is currently engaged in a number of theoretical 
and basic research investigations which are expected to lead to 
the formulation of a theory of documentation and of search- 
ing strategy. A basic program, being conducted for the Air 
Force Office of Scientific Research, is leading to the develop- 
ment of a mathematical model for a documentation system. 
When this model has been formulated, it will be used to 
develop reliable procedures for comparative searching tests to 
measure and to evaluate the performance of various informa- 
tion retrieval systems. 

Another program involves comparative tests of notation 
systems for organic chemical compounds. This study is being 
conducted for the National Science Foundation, under the 
auspices of the National Academy of Sciences—National 
Research Council. Four teams of three chemists each were 
asked to apply the Dyson and the Wiswesser notations to 
encode 1,000 compounds and later to decode the ciphers for 
another 1,000 chemical compounds. The errors and types of 
errors made by each chemist have been counted, analyzed, and 
tabulated in a report by A. Pratt and J. W. Perry, “Chemical 
Notation Study, Phase Report” (45 pp., August 1, 1959). 
International Conference 


On September 6-12, 1959, Western Reserve University co- 
sponsored with the Rand Development Corp. an Interna- 
tional Conference for Standards on a Common Language for 
Machine Searching and Translation. More than 200 persons 
from 10 countries—Lrazil, France, India, Italy, Japan, Neth- 
erlands, United Kingdom, United States, U.S.S.R., and West 
Germany—heard 55 formal papers reviewing work in prog- 
ress in machine literature searching, machine translation, and 
language studies for machine searching, correlation, and 
translation. 

Proposals were made for intermediate, common, and uni- 
versal machine languages, for interconvertibility among lan- 
guages, and for advanced application of computer informa- 
tion systems in behavioral systems and in the automation of 
the research process. 

A committee representing 10 countries was named at the 
closing session of the Conference. The group will continue 
the work of the Conference through investigations in this 
field under the four main headings of research, nomenclature, 





; 
} 


acedstoe. 





DOCUMENTATION OF SCIENTIFIC INFORMATION 


exchange of materials and information, and exchange of 
personnel. 

Elected president of the committee was Mr. Brian Vickery 
of the Imperial Chemical Industries of Great Britain. Allen 
Kent, associate director of WRU’s Center for Documentation 
and Communication Research, was elected general secretary. 
Vice presidents are J. Dekker of the Netherlands, S. R. 
Ranganathan of India, Rudolf Bolting of Brazil, and a repre- 
sentative, yet to be named, of the U.S.S.R. Sponsorship will 
be sought by the committee from existing agencies such as the 
International Standards Organization and the United Na- 
tions. 

A summary statement of the Conference was also submitted 
for the committee’s files. The proceedings of the Conference 
are to be published in 1960 (Interscience Publishers, Inc., New 


York). 
O 


283 





